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Abstract 

The activity of collections of synchronizing neurons can be represented by weakly coupled nonlinear 
phase oscillators satisfying Kuramoto's equations. In this article, we build such neural-oscillator mod- 
els, partly based on neurophysiological evidence, to represent approximately the learning behavior 
predicted and confirmed in three experiments by well-known stochastic learning models of behavioral 
stimulus-response theory. We use three Kuramoto oscillators to model a continuum of responses, and 
we provide detailed numerical simulations and analysis of the three-oscillator Kuramoto problem, in- 
cluding an analysis of the stability points for different coupling conditions. We show that the oscillator 
simulation data are well-matched to the behavioral data of the three experiments. 

Keywords: learning; neural oscillators; three-oscillator Kuramoto model; stability points of the 
Kuramoto model; stimulus-response theory; phase representation; continuum of responses 



1. Introduction 



With the advent of modern experimental and computational techniques, substantial progress has 
been made in understanding the brain's mechanisms for learning. For example, extensive computa- 
tional packages based on detailed experimental data on ion-channel dynamics make it now possible to 
simulate the behavior of networks of neurons starting from the flow of ions through neurons' mem- 
branes (Bower and Beeman 20031. Yet, there is much to be determined at a system level about brain 
processes. This can be contrasted to the large literature in psychology on learning, psychophysics, 
and perception. Among the most developed behavioral mathematical models of learning are those of 



stimulus-response (SR) theory (Estes 1950 1959 Suppes 1959 1960 Suppes and Atkinson 1960 



Suppes and Ginsberg[ 1963[ ). (From the large SR literature, we reference here only the results we use 



in some detail.) In this article, we propose weakly coupled phase oscillators as neural models to do 
the brain computations needed for stimulus-response conditioning and response conditioning. We then 
use such oscillators to analyze two experiments on noncontingent probabilistic reinforcement, the first 
with a continuum of responses and a second with just two. A third experiment is on paired-associate 
learning. 

The oscillator assumptions we make are broadly based on neurophysiological evidence that neural 
oscillators, made up of collections of synchronized neurons, are apparently ubiquitous in the brain. 



Their oscillations are macroscopically observable in electroencephalograms (Freeman 1979 Gerstner 



and Kistler 2002 Wright and Liley 19951. Detailed theoretical analyses have shown that weakly 



interacting neurons close to a bifurcation exhibit such oscillations (Gerstner and Kistler 2002 Hop- 



pensteadt and Izhikevich 1996a|b[ Izhikevich 2007). Moreover, many experiments not only provide 
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evidence of the presence of oscillators in the brain ( Eckhorn et al.l 1988'; 'Friedrich et al. 2004' 'Kazant- 



sev et al.| p004| |Lutz etall |2002| [Murthy and Fetz| p^92| |Rees et al. 2002, [Rodriguez et al., ,1999 
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2000 



Tallon-Baudry et al. 



show that their synchronization is related to perceptual processing (Friedrich et al. 



et al. 2004 Leznik et al. 2002 Murthy and Fetz 1992 Sompolinsky et al. 19901. They may play 



a role in solving the binding problem (Eckhorn et al. 19881. More generally, neural oscillators have 



2001 



Wang 



1995 



2004 



but also 



Kazantsev 



already been used to model a wide range of brain functions, such as pyramidal cells (Lytton and Se- 



jnowski 1991 ), effects of electric fields in epilepsy ( [Park et al. 2003 ), activities in the cat visual cortex 



(Sompolinsky et al. 19901, learning of songs by birds (Trevisan et al. 2005), learning (Vassilieva et al. 



20111, and coordinated finger tapping (Yamanishi et al. 1980 1. Suppes and Han (20001 showed that 



a small number of frequencies can be used to recognize a verbal stimulus from EEG data, consistent 
with the brain representation of language being neural oscillators. From a very different perspective, 
in Billock and Tsou (2005) synchronizing neural oscillators were used to rescale sensory information, 



and Billock and Tsou 



(2011 1 used synchronizing coupled neural oscillators to model Steven's law, thus 
suggesting that neural synchronization is relevant to cognitive processing in the brain. 

Our working hypothesis is that main cognitive computations needed by the brain for the SR ex- 
periments we consider can be modeled by weakly coupled phase oscillators that together satisfy Ku- 
ramoto's nonlinear equations ( Acebron et al. 2005 Hoppensteadt and Izhikevich 1996a|b Kuramoto 



1984 Strogatz 2000 Winfree 



20021 



When a stimulus is sampled, its corresponding neural oscilla- 
tor usually phase locks to the response-computation oscillators. When a reinforcement occurs, the 
coupling strengths will often change, driven by the reinforcement oscillator. Thus, SR conditioning is 
represented by phase locking, driven by pairwise oscillator couplings, whose changes reflect learning. 

Despite the large number of models of processing in the brain, we know of no detailed systematic 
efforts to fit neural oscillator models to behavioral learning data. We choose to begin with SR theory 
for several reasons. This theory has a solid mathematical foundation ('Suppes 2002); it has been used 



to predict many non trivial quantitative features of learning, especially in the form of experimentally 
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observed conditional probabihties (Bower 1961 Suppes and Atkinson 1960 Suppes and Ginsberg 



It can also be used to represent computational structures, such as finite 



our treatment of a continuum of responses below) , they may provide a basis for quantum-like behavior 
in the brain ( de Barros and Suppes[ 2009 Suppes and de Barros 2007 1 , an area of considerable research 



in recent years (Bruza et al. 2009). 

In Section [2 we start with a brief review of SR theory, followed by a detailed stochastic process 
version of an SR model for a continuum of responses. In Section [4] we present in some detail the neural 
oscillator computation for the SR continuum model. Extension to other SR models and experiments 
follows, as already remarked. In Section[5]we compare the computations of the oscillator simulations to 
empirical data from three behavioral experiments designed to test the predictions of stimulus-response 
theory. 



2. Stimulus-Response Theory 



The first aim of the present investigation is to use the extension of stimulus-response theory to 
situations involving a continuum of possible responses, because the continuum case most closely corre- 
sponds to the physically natural continuous nature of oscillators. The SR theory for a finite number of 



(Suppes 1960). 



responses stems from the basic paper by (Estes 1950); the present formulation resembles most closely 
that given for the finite case in Suppes and Atkinson (1960 Chapter 1), and in the continuum case 



The general experimental situation consists of a sequence of trials. On each trial the participant 
(in the experiment) makes a response from a continuum or finite set of possible responses; his response 
is followed by a reinforcing event indicating the correct response for that trial. In situations of simple 
learning, which are characterized by a constant stimulating situation, responses and reinforcements 
constitute the only observable data, but stimulus-response theory postulates a considerably more 
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complicated process which involves the conditioning and sampling of stimuli, which are best interpreted 
as patterns of stimuli, with a single pattern being sampled on each trial. In the finite case the usual 
assumption is that on any trial each stimulus is conditioned to exactly one response. Such a highly 
discontinuous assumption seems inappropriate for a continuum or a large finite set of responses, and 
it is replaced with the general postulate that the conditioning of each stimulus is smeared over a set 
of responses, possibly the whole continuum. In these terms, the conditioning of any stimulus may 
be represented uniquely by a smearing probability distribution, which we also call the conditioning 
distribution of the stimulus. 

The theoretically assumed sequence of events on any trial may then be described as follows: 

trial begins response occurs, 

with each a being drawn from possible 

stimulus in a stimulus the conditioning reinforcement change in 

a certain (pattern) is distribution occurs conditioning 

state of sampled of the sampled occurs, 

conditioning stimulus 

The sequence of events just described is, in broad terms, postulated to be the same for finite and 
infinite sets of possible responses. Differences of detail will become clear. The main point of the axioms 
is to formulate verbally the general theory. As has already been more or less indicated, three kinds 
of axioms are needed: conditioning, sampling, and response axioms. After this development, more 
technically formulated probabilistic models for the different kind of experiments are given. 

2.1. General Axioms 

The axioms are formulated verbally but with some effort to convey a sense of formal precision. 
Conditioning Axioms. 

CI. For each stimulus s there is on every trial a unique conditioning distribution, called the smearing 
distribution, which is a probability distribution on the set of possible responses. 

C2. If a stimulus is saniplcci on a trial, the mode of its snicariiig distribution becomes, with probability 
9, the point (if any) which is reinforced on that trial; with probability 1—9 the mode remains unchanged. 

C3. If no reinforcement occurs on a trial, there is no change in the smearing distribution of the sampled 
stimulus. 

C4. Stimuli which are not sampled on a given trial do not change their smearing distributions on that 
trial. 

C5. The probability 9 that the mode of the smearing distribution of a sampled stimulus will become 
the point of the reinforced response is independent of the trial number and the preceding pattern of 
occurrence of events. 

Sampling Axioms. 

51. Exactly one stimulus is sampled on each trial. 

52. Given the set of stimuli available for sampHng on a given trial, the probability of sampling a given 
element is independent of the trial number and the preceding pattern of occurrence of events. 
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Response Axioms. 



Rl. The probability of the response on a trial is solely determined by the smearing distribution of the 
sampled stimulus. 

These axioms are meant to make explicit the conceptual framework of the sequence of events 
postulated to occur on each trial, as already described. 

2.2. Stochastic Model for a Continuum of Responses 

Four random variables characterize the stochastic model of the continuum-of-responses experiment: 
ft is the probability space, P is its probability measure, S is the set of stimuli (really patterns of 
stimuli), R is the set of possible responses, E is the set of reinforcements, and S, X, and Y, are the 
corresponding random variables, printed in boldface. The other random variable is the conditioning 
random variable Z, which carries the essential information in a single parameter z„. It is assumed 
that on each trial the conditioning of a stimulus s in S* is a probability distribution Kg {r\z) , which is 
a distribution on possible responses x in R. So, in the experiments analyzed here, the set of responses 
R is just the set of real numbers in the periodic interval [0, 2it]. It is assumed that this distribution 
Kg {r\z) has a constant variance on all trials and is defined by its mode z and variance. The mode 
changes from trial to trial, following the pattern of reinforcements. This qualitative formulation is 
made precise in what follows. The essential point is this. Since only the mode is changing, we can 
represent the conditioning on each trial by the random variable Z. So Z,5.„ = Zs.n says what the mode 
of the conditioning distribution Kg {r\zn) for stimulus ,s is on trial n. We subscript the notation z„ 
with 3 as well, i.e., Zs_n when needed. Formally, on each trial n, z„ is the vector of modes (zi, • • • , Zm) 
for the m stimuli in S. So fl — {il, P, S, R, E) is the basic structure of the model. 

Although experimental data will be described later in this paper, it will perhaps help to give a 
schematic account of the apparatus which has been used to test the SR theory extensively in the case 
of a continuum of responses. A participant is seated facing a large circular vertical disc. He is told that 
his task on each trial is to predict by means of a pointer where a spot of light will appear on the rim 
of the disc. The participant's pointer predictions are his responses in the sense of the theory. At the 
end of each trial the "correct" position of the spot is shown to the participant, which is the reinforcing 
event for that trial. 

The most important variable controlled by the experimenter is the choice of a particular reinforce- 
ment probability distribution. Here this is the noncontingent reinforcement distribution F{y), with 
density / (y) , on the set R of possible responses, and "noncontingent" means that the distribution of 
reinforcements is not contingent on the actual responses of the participant. 

There are four basic assumptions defining the SR stochastic model of this experiment. 

Al. If the set S of stimuli has m elements, then P(S„ = slse^*) = — . 

m 

A2. If a2 — ai < 27r then P (ai < X„ < a2|S„ = s, 'Ls,n = z^ = J^^ kg {x\y) dx. 

A3, (i) P(Zs,„+i = y I S„ = s, Y„ = y&Zs,„ = z.^n) = 6 
and 

(ii) P{Zs,n + l = Zs,n I S„ = S, Y„ = t/ & Z^,„ = Zs^n) = (1 " ^) • 

A4. The temporal sequence in a trial is: 

Z„ S„ X„ — )• Y„ Z„_|_i. (1) 

Assumption Al defines the sampling distribution, which is left open in Axioms SI and S2. As- 
sumptions A2 and A3 complement Axioms CI and C2. The remaining conditioning axioms, C3, C4, 
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and C5, have the general form given earher. The same is true of the two samphng axioms. The two 
axioms C5 and S2 are just independence- of-path assumptions. These axioms are crucial in proving that 
for simple reinforcement schedules the sequence of random variables which take as values the modes of 
the smearing distributions of the sampled stimuli constitutes a continuous-state discrete-trial Markov 
process. 

For example, using J„ for the joint distribution of any finite sequence of these random variables 
and jn for the corresponding density, 

P (ai ^ X„ ^ a2|S„ = s, Zs ,„ = z) = I j„ {x\s, z) dx = Kg (02; z) - Kg (ai; z) . 

J ai 

The following obvious relations for the response density r„ of the distribution i?„ will also be helpful 
later. First, we have that 

i.e., r„ is just the marginal density obtained from the joint distribution j„. Second, we have "expan- 
sions" like 

/.27r 







and 



nllX pin nl'K 

• n 1) dZs.ndl/n-ldXn-l- 

Jo Jo Jo 

2.3. Noncontigent Reinforcement 

Noncontingent reinforcement schedules are those for which the distribution is independent of n, 
the responses, and the past. We first use the response density recursion for some simple, useful results 
which do not explicitly involve the smearing distribution of the single stimulus. There is, however, one 
necessary preliminary concerning derivation of the asymptotic response distribution in the stimulus- 
response theory. 

Theorem 1. In the noncontingent case, if the set of stimuli has m elements, 

1 f^'' 

r (x) = lim r„ (a;) = — / h (x; y) f (y) dy. (2) 

We now use ^ to establish the following recursions. In the statement of the theorem i<^(X„) is 
the expectation of the response random variable X„, /i,. (X„) is its r-th raw moment, cr^ (X„) is its 
variance, and X is the random variable with response density r. 

Theorem 2. 

r„+i (x) = {l-0) r„ (x) + 6r (x) , (3) 
E{X^+i)^{l-e)E{Xn) + eE{X), (4) 

Hr (X„+i) = (1 - 6*) Hr (X„) -I- 9 fir (X) , (5) 

(X„+i) = {l-e) (X„) + ea^ (X) +9{l-e)[E (X„) - E (X)]' (6) 

Because (|3|-([5| are first-order difference equations with constant coefficients we have as an imme- 
diate consequence of the theorem: 
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Corollary 1. 

r„ (x) =r{x)- [r (x) - n (x)] (1 - 0)""^ , (7) 

E (x„) = (X) - [i? (X) - E (Xi)] (1 - ef-' , (8) 

Ai. (X„) = fir (X) - (X) - fir (Xi)] (1 - 0)"^^ (9) 

Although the one-stimulus model and the A^-stimulus model both yield ([3])-([9]), predictions of the 
two models are already different for one of the simplest sequential statistics, namely, the probability 
of two successive responses in the same or different subintervals. We have the following two theorems 
for the one-stimulus model. The result generalizes directly to any finite number of subintervals. 

Theorem 3. For noncontingent reinforcement 
lim F (0 ^ X„+i ^ c, ^ X„ ^ c) 



eR{c^^ + {l-e)\^^ [ [ [ ks{x; z)k,,{x'; z)f{z)dxdx'dz, (10) 

„/,c „,c. "'0 -'0 "'0 



s'eS seS 

and 

lim F (0 ^ X„+i ^ c, c ^ X„ ^ 2tt) 

n— >-oo 

rc r27r /.27r 



= eR(c)\l- R(c)] + (l-9)^y^y^ / / ks(x; z)ks' (x; z) f (z)dxdxdz, (11) 
where 

R (c) = lim i?„ (c) . 

We conclude the treatment of noncontingent reinforcement with two expressions dealing with im- 
portant sequential properties. The first gives the probability of a response in the interval [ai, 02] given 
that on the previous trial the reinforcing event occurred in the interval [61, 62]- 

Theorem 4. 

P (ai ^ X„+i ^ a2|6i ^Yn^ 62, 03 ^ X„ ^ 04) 

= (1 - (?) [i?„ (02) - Rn (ai)] + f (5^) (fc^) E / " / ' y) / (y) ^y- (12) 

The expression to which we now turn gives the probability of a response in the interval [oi, 02] 
given that on the previous trial the reinforcing event occurred in the interval [61, 62] and the response 
in the interval [03, 04]. 

Theorem 5. 

P (ai 5^ X„+i ^ a2 I &i ^ Yn ^ &2, as ^ X„ ^ 04) 

/'27r /"aQ 



[1-9) 1 



/•.^ITT rCLA 

/ / / ks {x; z) ks' (x \ z) Qn {z) dx dx dz. 



i?„ (04) - i?„ (03) m2 ^0 

9 

^ F{h)-F{h)mj^J,^ A 



rE / / ks{x] z) f {y)dxdy. (13) 
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2.4- More General Comments on SR theory 

The first general comment concerns the sequence of events occurring on each trial, represented 
earlier by equation ([ij in Assumption A4 of the stochastic model. We want to examine in a preliminary 
way when brain computations, and therefore neural oscillators, are required in this temporal sequence. 

(i) Z„ sums up previous conditioning and does not represent a computation on trial n; 

(ii) S„, which represents the experimentally unobserved sampling of stimuli, really patterns of stim- 

uli, uses an assumption about the number of stimuli, or patterns of them, being sampled in an 
experiment; the uniform distribution assumed for this sampling is artificially simple, but com- 
putationally useful; in any case, no brain computations are modeled by oscillators here, even 
though a complete theory would be required; 

( in ) X„ represents the first brain computation in the temporal sequence on a trial for the stochastic 
model; this computation selects the actual response on the trial from the conditioning distribution 
{x\zg^n)i where s is the sampled stimulus on this trial (S„ ~ s) ; this is one of the two key 
oscillator computations developed in the next section; 

(iv) Y„ is the reinforcement random variable whose distribution is part of the experimental design; 

individual reinforcements are external events totally controlled by the experimenter, and as such 
require no brain computation by the participant; 

(v) Z„_(_i summarizes the assumed brain computations that often change at the end of a trial the 

state of conditioning of the stimulus s sampled on trial n; in our stochastic model, this change in 
conditioning is represented by a change in the mode Zs_„ of the distribution (x\zs^n)- From the 
assumptions Al-4 and the general independence-of-path axioms, we can prove that the sequence 



of random variable Zi, . . . , Z„, . . . is a first-order Markov process (Suppes 1960). 



The second general topic is of a different sort. It concerns the important concept of activation, which 
is a crucial but implicit aspect of SR theory. Intuitively, when a stimulus is sampled, an image of that 
stimulus is activated in the brain, and when a stimulus is not sampled on a trial, its conditioning does 
not change, which implies a constant state of low or no activity for the brain image of that unsampled 
stimulus. 

Why is this concept of activation important for the oscillator or other physical representations of 
how SR theory is realized in the brain? Perhaps the most fundamental reason arises from our general 
conception of neural networks, or, to put it more generally, "purposive" networks. For large networks, 
i.e., ones with many nodes, it is almost always unrealistic to have all nodes active all of the time. 
For biological systems such an inactive state with low consumption of energy is necessary in many 
situations. The brain certainly seems to be a salient example. Without this operational distinction, 
imagine a human brain in which episodic memories of many past years were as salient and active 
as incoming perception images. This critical distinction in large-scale networks between active and 
inactive nodes is often ignored in the large literature on artificial neural networks, but its biological 
necessity has long been recognized, and theoretically emphasized in psychology since the 1920s. In 
this paper, we accept the importance of activation, but do not attempt an oscillator representation of 
the physical mechanism of activating brain images arising from perceived stimuli. This also applies to 
the closely related concept of spreading activation, which refers to brain images perceptually activated 
associatively by other brain images, which are not themselves directly activated by perception. 



3. Model overview 

The neural oscillator model we developed is significantly more complex, both mathematically and 
conceptually, than SR theory. Furthermore, it requires the use of some physical concepts that are 
probably unfamiliar to some readers of this journal. So, before we present the mathematical details of 
the model in Section [4] we introduce here the main ideas behind the oscillator model in a conceptual 
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way, and show each step of the computation and how it relates to SR theory. Schematically, the 
oscillator model corresponds to the following steps, described in more detail later in this section. 

1. Trial n starts with a series of stimulus and response neural oscillators connected through excita- 
tory and inhibitory couplings. 

2. A stimulus oscillator is activated, and by spreading activation the response oscillators. 

3. Activation leads to new initial conditions for the stimulus and response oscillators; we assume 
such conditions are normally distributed. 

4. The stimulus and response neural oscillators evolve according to non-linear deterministic differ- 
ential equations. 

5. After a response time At^, relative phase relations between the stimulus and response oscillators 
determine the response made on trial n. 

6. The reinforcement oscillator is then activated at time te after the response has occured. 

7. Activation of the reinforcement oscillator leads to other new "initial" conditions at t^', such 
conditions we again assume are normally distributed. 

8. The stimulus, response, and reinforcement neural oscillators, as well as their couplings, evolve 
according to nonlinear deterministic differential equations. 

9. After a time interval Ate, reinforcement is completed, and the excitatory and inhibitory couplings 
may have changed. 

Step 1 corresponds to the random variable Z„ in equation ([ij. Step 2 to S„, Steps 3-5 to X„, and 
Steps 6-9 to Y„ and Z„+i. Let us now look at each process in more detail, from a neural point of 
view. 

Sampling. We start the description of our model by the sampling of a stimulus, S„. For each element 
s of the set of stimuli S, we assume the existence of a corresponding neural phase oscillator, (fig. The 
sampling of a specific s thus activates the neural oscillator (ps- As mentioned before, we do not present 
a detailed theory of activation or spreading activation, but simply assume that for each trial a ifs is 
activated according to a uniform distribution, consistent with S„. Once an oscillator ifg is activated, 
this activation spreads to those oscillators coupled to it, including the response oscillators (pr-^ and ipr^- 

Response. In our model, we simplify the dynamics by considering only two response oscillators, ip^ 
and (pr2 ■ We should emphasize that, even though we are talking about two response oscillators, we are 
not thinking of them as modeling two responses, but a continuum of responses. Intuitively, we think 
of as corresponding to a certain extreme value in a range of responses, for example 1, and (^^2 to 
another value, say — 

Couplings between the stimulus and response oscillators can be of two types: excitatory or in- 
hibitory. Excitatory couplings have the effect of promoting the synchronization of two oscillators in a 
way that brings their phases together. Inhibitory couplings also promote synchronization, but in such 
a way that the relative phases are off by tt. If oscillators are not at all coupled, then their dynamical 
evolution is dictated by their natural frequencies, and no synchronization appears. Figure [T| shows the 
evolution for three uncoupled oscillators. However, if a system starts with oscillators randomly close 
to each other in phase, when oscillators are coupled, some may evolve to be in-phase or out-of-phase. 

To describe how a response is computed, we first discuss the interference of neural oscillators. At 
the brain region associated to , we have the oscillatory activities due not only to but also to (ps , 
since the stimulus oscillator is coupled to tpn ■ This sum of oscillations may result in either constructive 
or destructive interference. Constructive interference will lead to stronger oscillations at ipr-^ , whereas 
destructive interference will lead to weaker oscillations. So, let us assume that the couplings between 
(pg and the response oscillators are such that the stimulus oscillator synchronizes in-phase with ip^-^ 



In reality, responses in our model have the same topological structure as the unit circle. See SectionWlfor details. 
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(a) 



40 




Figure 1: Time evolution of three oscillators represented by A(t) = cos {(p{t)), where ip{t) is the phase. Graph (a) shows 
the oscillations for uncoupled oscillators with frequencies 11 Hz (solid line), 10 Hz (dashed), and 9 Hz (dash-dotted). 
Notice that though the oscillators start roughly in phase, they slowly dephase as time progresses. Graph (b) shows the 
phases ip{t) of the same three oscillators in (a). 



and out-of-phase with ipr^ ■ Synchronized in-phase oscillators correspond to higher intensity than out- 
of-phase oscillators. The dynamics of the system is such that its evolution leads to a high intensity for 
the superposition of the stimulus ips and response oscillator ipr-^ (constructive interference), and a low 
intensity for the superposition of ips and (fr2 (destructive interference). Thus, in such a case, we say 
that the response will be closer to the one associated to (pn, i.e., 1. Figure [2] shows the evolution of 
coupled oscillators, with couplings that lead to the selection of response 1. 

From the previous paragraph, it is clear how to obtain answers at the extremes values of a scale (—1 
and 1 in our example). However, the question remains on how to model a continuum of responses. To 
examine this, let us call Ii the intensity at (/j^ due to its superposition with ip^, and let us call I2 the 
intensity for tp^^ and ips ■ Response 1 is computed from the oscillator model when Ii is maximum and I2 
is minimum, and —1 vice versa. But those are not the only possibilities, as the phase relations between 
ips and the response oscillators do not need to be such that when Ii is maximum I2 is minimum. 
For example, it is possible to have Ii — l2- Since we are parametrizing 1 (—1) to correspond to a 
maximum for Ii {I2) and a minimum for I2 (/i), it stands to reason that Ii = I2 corresponds to 0. 
More generally, by considering the relative intensity between Ii and I2, defined by 



which can be rewritten as 



h/h - 1 



(14) 



if I2 ^ 0, 



/1//2 + I 

any value between —1 and 1 may be computed by the oscillator model. For example, when I2 = (l/3)/i. 



we see from equation (14) that the computed response would be 1/2, whereas I2 = 3/i would yield 
— 1/2. We emphasize, though, that there is nothing special about —1 and 1, as those were values that 
we selected to parametrize the response; we could have selected a and /3 as the range for our continuum 



of responses by simply redefining h in Eq. ( 14 1 
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Figure 2: Same three oscillators as in Figure^ but now coupled. We see from (a) that even though the system starts 
with no synchrony, as oscillators have different phases and natural frequencies, very quickly (after 100 ms) the stimulus 
oscillator s (dash-dotted line) becomes in synch and in phase with response ri (solid line), while at the same time 
becoming in synch but off phase with r2- This is also represented in (b) by having one of the oscillators approach the 
trajectory of the other. Graph (c) shows the phase differences between the stimulus oscillators s and ri (solid line) and 
s and r2 (dashed line). Finally, (d) shows the relative intensity of the superposition of oscillators, with 1 corresponding 
to maximum intensity at ri and minimum at r2 and —1 corresponding to a maximum in r2 and minimum in ri. As we 
can see, the couplings lead to a response that converges to a value close to 1. 



10 



0.05 0.1 0.15 0.2 

Time (s) 



Figure 3: Response computation trajectories for 100 randomly-selected initial conditions for an oscillator set conditioned 
to the response 1/2. 



It can now be seen where the stochastic characteristics of the model appear. First, Step 2 above 
corresponds to a random choice of stimulus oscillator, external to the model. Once this oscillator 
is chosen, the initial conditions for oscillators in Step 3 have a stochastic nature. However, once 
the initial conditions are selected, the system evolves following a deterministic (albeit nonlinear) set 
of differential equations, as Step 4. Finally, from the deterministic evolution from stochastic initial 
conditions, responses are computed in Step 5, and we see in Figure [3] an ensemble of trajectories for 
the evolution of the relative intensity of a set of oscillators. It is clear from this figure that there is a 
smearing distribution of computed responses at time 0.2 s. This smearing is smaller than the empirical 
value reported in Suppes et al. (1964). But this is consistent with the fact that the production of a 
response by the perceptual and motor system adds additional uncertainty to produce a higher variance 
of the behavioral smearing distribution. 



Reinforcement. Reinforcement of y is neurally described by an oscillator ip^^y that couples to the 
response oscillators with phase relations consistent with the relative intensities of ipr-^ and ipr2 producing 
the response y. The conditioning state of the oscillator system is coded in the couplings between 
oscillators. Once the reinforcement oscillator is activated, couplings between stimulus and response 
oscillators change in a way consistent with the reinforced phase relations. For example, if the intensity 
were greater at (fr^ but we reinforced , then the couplings would evolve to push the system toward 
a closer synchronization with ipr-^ . Figure [5] shows the dynamics of oscillators and couplings during an 
effective reinforcement, and Figure [2] shows the selection of a response for the sampling of the same 
set of oscillators after the effective reinforcement. 



4. Oscillator Model for SR-theory 



Now that we have presented the oscillator model from a conceptual point of view, let us look at 



the mathematical details of the model. As mentioned in (iii) and (v) of Section 2.4 our goal in this 



paper is to model the stochastic processes described by the random variables X„ and Z„ in terms 
of oscillators. So, let us start with the oscillator computation of a response, corresponding to X„. 
When we think about neural oscillators, we visualize groups of self-sustaining neurons that have some 
coherence and periodicity in their firing patterns. Here, we assume one of those neural oscillators 
corresponds to the sampled stimulus s. We will describe below how two neural oscillators ri and r2 
can model the computation of randomly selecting response from a continuum of possible responses 
in accordance with a given probability distribution. A way to approach this is to consider the three 
harmonic oscillators, s, ri, and r2, which for simplicity of computation we chose as having the same 
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Figure 4: (a) Dynamics of coupled oscillators for the first 6 X 10^'^ seconds of reinforcement. Because reinforcement 
couplings are strong, oscillators quickly converge to an asymptotic state with fixed frequencies and phase relations. 
Graph (b) shows the quick convergence to the phase relations — 7r/2 and 7r/2 for ri and r2 with respect to s. Lines in (a) 
and (b) correspond to the same as Figures^ and [2] Graphs (c), (d), (e) and (f) show the evolution of the excitatory and 
inhibitory couplings. Because the system is being reinforced with the phase differences shown in (b), after reinforcement, 
if the same oscillators are sampled, we should expect a response close to zero in the —1 and 1 scale. 
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Figure 5: Time evolution of the same oscillators as Figure[2] but now with the new couplings generated after the effective 
reinforcement shown in Figure |4] Graphs (a) and (b) show the three oscillators, (c) the phase differences, and (d) the 
response. We see that the response goes to a value close to zero, consistent with the reinforcement of zero. 
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natural frequency ujq and the same amplitude. We write 



s{t) = Acos{ojQt) = A cos {(ps{t)) , (15) 
ri{t) = Acos{uJot + S(f>i) — Acos{^pri{t)) , (16) 
r2{t) = Acqs {ujQt + 6(1)2) ^ Acqs (ipr^it)) , (17) 

where s{t), ri(t), and r2(t) represent harmonic oscillations, ips(t), fnit), and (pr2{t) their phases, 
d(j)i and 5(j)2 are constants, and A their amplitude. Notice that, since all oscillators have the same 
amplitude, their dynamics is completely described by their phases. Since neural oscillators have a 
wave-like behavior (iNunez and Srinivasan 2006), their dynamics satisfy the principle of superposition, 



thus making oscillators prone to interference effects. As such, the mean intensity, as usually defined 
for oscillators, give us a measure of the excitation carried by the oscillations. The mean intensity Ii is 
the superposition of s(t) and ri{t), or 



where ()j is the time average. A quick computation gives 

h = A2(1 + cos((S(/)i)), 

and, similarly for I2, 

l2^A^{l + cos{S^2)). 

Therefore, the intensity depends on the phase difference between the response-computation oscillators 
and the stimulus oscillator. 

Let us call and I2 the intensities after learning. The maximum intensity of and I2 is 
2A'^, whereas their minimum intensity is zero. Thus, the maximum difference between these intensities 
happens when their relative phases are such that one of them has a maximum and the other a minimum. 
For example, if we choose S(j>i = and 5(j)2 = tt, then I^max = (1 + cos (0)) = 2A'^ and l2min = 
A^ (1 -I- cos (tt)) — 0. Alternatively, if we choose i5^i = tt and (502 = 0, Iimin = (1 + cos (tt)) = 
and l2max = (1 + cos (0)) — 2A^. However, this maximum contrast should only happen when the 
oscillator learned a clear response, say the one associated with oscillator ri{t). When we have in- 
between responses, we should expect less contrast, with the minimum happening when the response 
lies on the mid-point of the continuum between the responses associated to ri(t) and r2(t). This 
happens if we impose 

5(f>i ^S4>2+TT = Scj), (18) 



which results in 
and 



=A^{l + cos{d^)), (19) 



==^2(1 -cos ((50)). (20) 



From equations (19 I and (20), let & G [—1, 1] be the normalized difference in intensities between ri and 
r2, i.e. 

^ ^ !l=cos(<5¥.), (21) 



It + I: 



2 



< Sip < TT. So, in principle we can use arbitrary phase differences between oscillators to code for a 
continuum of responses between —1 and 1 (more precisely, because S^p is a phase, the interval is in the 
unit circle T, and not in a compact interval in M). For arbitrary intervals (Ci,C2)i that is required 
is a re-scaling of b. 
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We now turn to the mathematical description of these qualitative ideas. As we saw, we are as- 
suming that the dynamics is encoded by the phases in equations (15)-(17l. We assume that stimulus 



and response-computation neural oscillators have natural frequencies wo, such that their phases are 
ip(t) = uJot + constant, when they are not interacting with other oscillators. The assumption of the 
same frequency for stimulus and response-computation oscillators is not necessary, as oscillators with 



different natural frequency can entrain (Kuramoto 19841, but it simplifies our analysis, as we focus on 



phase locking. In real biological systems, we should expect different neural oscillators to have different 
frequencies. We also assume that they are (initially weakly) coupled to each other with symmetric 
couplings. The assumption of coupling between the two response-computation oscillators ri and r2 
is a detailed feature that has no direct correspondence in the SR model. We are also not requiring 
the couplings between oscillators to be weak after learning, as learning should usually lead to the 
strengthening of couplings. This should be contrasted with the usual requirements of weak couplings 



in the Kuramoto model ( Kuramoto 1984 ) 



At the beginning of a trial, a stimulus is sampled, and its corresponding oscillator, Sj, along with ri 
and r2, are activated. The sampling of a stimulus is a stochastic process represented in SR theory by 
the random variable S„, but we do not attempt to model it in detail from neural oscillators. Instead, 
we just assume that the activation of the oscillators happens in a way consistent with SR theory; a 
more detailed model of activation that includes such sampling would be desirable, but goes beyond the 
scope of this paper. Once the stimulus and response-computation oscillators are activated, the phase 
of each oscillator resets according to a normal distribution centered on zero, i. e., Tp = 0, with standard 
deviation a^, which we here assume is the same for all stimulus and response-computation oscillators. 
(We choose Tp = without loss of generality, since only phase differences are physically meaningful. 
A Gaussian is used to represent biological variability. A possible mechanism for this phase reset can 
Wang| ( [l995| ).) Let 



be found in 
n, and let At 
and r2 



be the time at which the stimulus oscillator is activated on trial 



be the average amount of time it takes for a response to be selected by oscillators ri 
We use Atr as a parameter that is estimated from the experiments, but we believe that more 
detailed future models should be able to predict the value of Atr from the dynamics. Independently 
of n, the probability density for the phase at time tg^n is given by 



exp 



(22) 



where i — Sj,ri,r2. After the stimulus is sampled, the active oscillators evolve for the time inter- 
val Atr according to the following set of differential equations, known as the Kuramoto equations 
(Hoppensteadt and Izhikevich 1996a|b Kuramoto 19841. 



dt 



ky sin {ipi 



ipj + 5ij) , 



(23) 



where ky is the coupling constant between oscillators i and j, and 5ij is an anti-symmetric matrix 
representing the phase differences we wish to code, and i and j can be either s, ri, or r2. 



For N stimulus oscillators 



1, 



N, we may now rewrite (23 1 for the special case of the 



three oscillator equations for Sj, ri, and r2. We introduce in these equation notation for excitatory 
(kf) and inhibitory (/c^) couplings. These are the AN excitatory and inhibitory coupling strengths 



between oscillators (a more detailed explanation of how (24|-(26) are obtained from (23 1, as well as 
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the physical interpretation of the coefficients kl_^ , 



■ ' ^^v 1-2' ^ri r2 '^^^ found in subsection 4.2 I 



~dt 



diPn 
dt 



dt 



^0 - fcf;,ri sin ((/3s, - </3ri) 
r2 sin ((/3s, -(^r^) 
n COS (.^s, - Vri) 
r2 '^OS ('^'^j -Vr2) , 
^0 - fc^i.s, sin {ipr, - (fis,) 

r2 Sin(¥'ri - (^r^) 
s, COS (.^rj - ip,^) 

-Hi.r2 COS ((^n - (fri) , 

- fcf,,s, sin((^r, - </9s,) 
-^^2,^1 sin ((^,-2 - (/3rJ 

cos {^r2 - fs,) 
COS {ipr2 -fri), 



-ki. 
-ki. 



-k^ 

-k' 



(24) 



(25) 



-k^ 

-k^ 



(26) 



where ipsj, frn and (pr2 are their phases, uiq their natural frequency. Equations (24|-(26P us uall y 
contain the amplitudes of the oscillators as a coupling factor. For example, instead of just fc^ 
the standard form of Kuramoto's equation would have a As Ar-^^kf. 



. r, in(l24j), 
term multiplying sin((/3s^- - iprj 



(|Kuramoto[ 1984|. For simplicity, we omit this term, as it can be absorbed by kf^ if the amplitudes 



are unchanged. Before any conditioning, the values for the coupling strengths are chosen following 
a normal distribution g{k) with mean k and standard deviation ak- It is important to note that 
reinforcement will change the couplings while the reinforcement oscillator is acting upon the stimulus 
and response oscillators, according to the set of differential equations presented later in this section. 



The solutions to (24|-(26l and the initial conditions randomly distributed according to f{fi) give us 
the phases at time tr,„ — is,n + Ai^- (Making Atr a random variable rather than a constant is a 
realistic generalization of the constant value we have used in computations.) The coupling strengths 
between oscillators determine their phase locking and how fast it happens. 



4-.1. Oscillator Dynamics of Learning from Reinforcement 

The dynamics of couplings during reinforcement, corresponding to changes in the conditioning „. 
A reinforcement is a strong external learning event that drives all active oscillators to synchronize 
with frequency We to the reinforcement oscillator, while phase locking to it. We choose We ^ i^o 



to keep its role explicit in our computations. In (24|-(26l there is no reinforcement, and as we 
noted earlier, prior to any reinforcement, the couplings , . . . , kf^ are normally distributed with 

mean k and standard deviation Uk- To develop equations for conditioning, we assume that when 
reinforcement is effective, the reinforcement oscillator deterministically interferes with the evolution 
of the other oscillators. This is done by assuming that the reinforcement event forces the reinforced 
response-computation and stimulus oscillators to synchronize with the same phase difference of 5lp, 
while the two response-computation oscillators are kept synchronized with a phase difference of tt. 
Let the reinforcement oscillator be activated on trial n at time t^^m ir,n+i > ie,n > ^r,n! during an 
interval Afg. Let Kq be the coupling strength between the reinforcement oscillator and the stimulus 
and response-computation oscillators. In order to match the probabilistic SR axiom governing the 
effectiveness of reinforcement, we assume, as something beyond Kuramoto's equations, that there is 
a normal probability distribution governing the coupling strength if g between the reinforcement and 
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the other active oscillators. It has mean Kg and standard deviation axo- Its density function is: 

1 



f{Ko) = exp {-^ (i^o - ^o)'| 



(27) 



As already remarked, a reinforcement is a disruptive event. When it is effective, all active oscillators 
phase-reset at te,ni and during reinforcement the phases of the active oscillators evolve according to 
the following set of differential equations. 



dt 



dt 



dfr2 

dt 



Wo - 






r. sin 


-kl 


cos 


-kl 


COS 



-i^OSin (ips^ - UJet) , 

'^o - kf^^^^ sin {ipr^ - ip,^) 

-fcfi,r2 Sin((^ri - ^T2) 
-fc^i,s, COS {ipr, -Vs,) 
-ki,,r2 COs((^^i - <fr2) 

—Kq sin {(pn — <-^et — 5(p) , 



Wo - 


kr2,s, sin {f, 


2 - 




n sin {(pr2 - 


fri) 


-k' 


COS {(pr^ - 




-k' 


n COS {ipr2 - 





—Kq sin {(fir^ — u)gt — 5ip + tt) . 



(28) 



(29) 



(30) 



The excitatory couplings are reinforced if the oscillators are in phase with each other, according to the 
following equations. 



dt 



dki 



dt 



dki. 



dk, 



dt 

E 



dt 
dk^ 

dt 
dk^ 

dt 





a cos (ips. - <Pri) 


kfj,ri 


(31) 


e(^o) 


a COS {(fis. - (Pr2) 


- k^ 


(32) 


eiKo) 


aCOs(<^ri - Vr2) 


'^ri,r2i 


(33) 


e(^o) 


a COS {(fis. - (fin) 


- k^ 


(34) 


e(^^o) 


a COS (ips. - <Pr2) 


'^r2,Sj 


(35) 


e(ifo) [a COS {(pr^ - (/?^J 


- k^ 1 


(36) 
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Similarly, for inhibitory connections, if two oscillators are perfectly off sync, then we have a reinforce- 
ment of the inhibitory connections. 



dt 



dki 



dt 
dk{._^j.^ 
dt 

dt 

dt 

dh^ 

" r2,r'i 

dt 



In the above equations. 



— ^l^oj 


a sin (^ifig. — (fin) 






= e (i^o) 


a sin (g9s, — osr,) 




(38) 


= e{K,) 


asin((^ri - (ySra) 


~ ^'-1,1-2] 


(39) 


= e(K^) 


a sin ((ySri - Vs,) 


- 


(40) 


= ^(Ko) 


asm (ipr2 - fsj) 


- k^ 


(41) 


- eiKo) 


asm{ipr2 - iPn) 


- k' 1 


(42) 




/ if iiTo < ^' 
1 Eo otherwise. 




(43) 



a threshold constant throughout all trials. The function e(ii'o) represents nonlinear effects in the brain. 



These effects could be replaced by the use of a sigmoid function eo(l + exp{— 7(^0 — if')}) ^ (Eeckman 
and Freeman 1991[ ) in (31 1-(42), but we believe that our current setup makes the probabilistic features 



clearer. In both cases, we can think of K' as a threshold below which the reinforcement oscillator has 
no (or very little) effect on the stimulus and response-computation oscillators. 



Before we proceed, let us analyze the asymptotic behavior of these equations. From (31|-(42l and 
with the assumption that Kq is very large, we have, once we drop the terms that are small compared 
to Ko, 



dt 

dipri 

dt 
dt 



UJo - Kq sin (ips^ - UJet) , 

ujo — Ko sin ((p,.j — Lo^t — Sip) , 

OJo — Ko sin {ipr2 — UJet — Sip + tt) . 



(44) 
(45) 
(46) 



It is straightforward to show that the solutions for (44|-(46l converge, for t > 2/ — (uje — ujq)'^ 
and >• (wq — i^e)^, to ips 



ujf,t — TT and ipr2 — i^et if 7^ ^0- So, the effect of the new 
terms added to Kuramoto's equations is to force a specific phase synchronization between sj, ri, and 
r2, and the activated reinforcement oscillator. This leads to the fixed points ips = uJet, fri = cuet + dip, 
ipr2 = cdet + Sip — TT. With this reinforcement, the excitatory couplings go to the following asymptotic 

a cos (Sip), fc^ ^2 = fc^ — ~a, and the inhibitory ones go 
and kl, ^„ = k 



values kfj.^ 
to ki 



ri,s 



-k^ 

^s,r2 



'^r2,s 



= -K^,s = -H,r2 = K2,s = as\rL{Sip), 



'r2,ri 



— 0. We show in 



Appendix A 



these couplings lead to the desired stable fixed points corresponding to the phase ditterences Sip. 
Sip = Q IS reinforced. 



that 



We now turn our focus to equations (31|-(42l 
For Ko > K' , ips- and ip, 



For simplicity, we will consider the case where 
evolve, approximately, to cjgt -I- tt, and ipr2 to ajgt. 
Thus, some couplings in (31 1-(42) tend to a solution of the form a + ci exp(— eo^); whereas others tend 
to —a-\-C2 exp(— egt), with ci and C2 being integration constants. For a finite time t > 1/eo- depending 
on the couplings, the values satisfy the stability requirements shown above. 

The phase differences between stimulus and response oscillators are determined by which reinforce- 



ment is driving the changes to the couplings during learning. From (31)-(42l, reinforcement may be 
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effective only if At^ > ^ (^^^ Seliger et al. (20021 and Appendix A.6I, setting a lower bound for ep 



as Ate is fixed by the experiment. For values of At^ > eg , the behavioral probability parameter 9 of 
effective reinforcement is, from (27 1 and (43 1, reflected in the equation: 



fiKo) dKo. 



(47) 



K' 



This relationship comes from the fact that if Kq < K' . there is no effective learning from reinforcement, 



since there are no changes to the couplings due to (31|-(42l, and (24|-(26) describe the oscillators' 



behavior (more details can be found in Appendix A). Intuitively K' is the effectiveness parameter. 
The larger it is, relative to Kq, the smaller the probability the random value Kq of the coupling will 



be effective in changing the values of the couplings through (31)-(42l 



Equations (31 1-(42) are similar to the evolution equations for neural networks, derived from some 
reasonable assumptions in 'Hoppensteadt and Izhikevich ( 1996a|b I. The general idea of oscillator 
learning similar to (31 |-(42l was proposed in Seliger et al. (20021 and Nishii (1998j) as possible learning 
mechanisms. 

Thus, a modification of Kuramoto's equations, where we include asymmetric couplings and in- 
hibitory connections, permit the coding of any desired phase differences. Learning becomes more 
complex, as several new equations are necessary to accommodate the lack of symmetries. However, 
the underlying ideas are the same, i.e., that neural learning happens in a Hebb-like fashion, via the 
strengthening of inhibitory and excitatory oscillator couplings during reinforcement, which approxi- 
mates an SR learning curve. 

In summary, the coded phase differences may be used to model a continuum of responses within 
SR theory in the following way. At the beginning of a trial, the oscillators are reset with a small 



fluctuation, as in the one-stimulus model, according to the distribution (22). Then, the system evolves 



according to (24)-(26l if no reinforcement is present, and according to (28l-(42l if reinforcement is 



present. The coupling constants and the conditioning of stimuli are not reset at the beginning of each 
trial. Because of the finite amount of time for a response, the probabilistic characteristics of the initial 
conditions lead to the smearing of the phase differences after a certain time, with an effect similar to 



that of the smearing distribution in the SR model for a continuum of responses ( Suppes and Frankmann 



(19611; Suppes et al. (1964l). We emphasize that the smearing distribution is not introduced as an 
extra feature of the oscillator model, but comes naturally from the stochastic properties of it. 

Oscillator Computations for Two-Response Models. An important case for the SR model is when 
participants select from a set of two possible responses, a situation much studied experimentally. We 
can, intuitively, think of this as a particular case where the reinforcement selects from a continuum 
of responses two well-localized regions, and one response would be any point selected in one region 
and the other response any point selected on the other region. In our oscillator model, this can be 
accomplished by reinforcing, say, Sip = 0, thus leading to responses on a continuum that would be 
localized in a region close to 6 = 1, and small differences between the actual answer and 1 would 
be considered irrelevant. More accurately, if a stimulus oscillator phase- locks in phase to a response- 
computation oscillator, this represents in SR theory that the stimulus is conditioned to a response, but 
we do not require precise phase locking between the stimulus and response-oscillators. The oscillator 
computing the response at trial n is the oscillator whose actual phase difference at time ir,n is the 
smaller with respect to Sj, i.e., the smaller c^. 



1 

2^ 



1 

2^ 



I - Vsj I 



(48) 



where \_x\ is the largest integer not greater than x, and i = 1,2. 
4-.2. Physical Interpretation of Dynamical Equations 



A problem with equation (23 1 is that it does not have a clear physical or neural interpretation. 
How are the arbitrary phase differences 5ij to be interpreted? Certainly, they cannot be interpreted by 
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the excitatory firing of neurons, as they would lead to Sij being either or tt, as shown in Appendix A 



Furthermore, since the kij represent oscillator couplings, how are the phase differences stored? To 



answer these questions, let us rewrite (23 1 as 



dipi 



i - ^ fcjj cos ((5.y) sin [ipi - ipj) 
j 

hj sin {6ij) cos {(fi - ipj) . 



(49) 



Since the terms involving the phase differences Sij are constant, we can write (49) as 

d(pi 

3 



dt 



^ [fc,g sin {lp, ~'ipj) + kl^ cos [ip, - Lpj)] , 



(50) 



where kfi 



Equation ( 23 ) now has an immediate physical inter- 



— kij cos {6ij) and fc/^- = kij sin {6ij] 

Since fc£ makes oscillators ipi and pj approach each other, as before, we can think of them 

When a neuron ipj fires 



pretation. .^^^ 

as corresponding to excitatory couplings between neurons or sets of neurons 
shortly before it is time for ipi to fire, this makes it more probable ipi will fire earlier, thus bringing its 
rate closer to (pj. On the other hand, for kjj the effect is the opposite: if a neuron in cpj fires, fcf^ makes 
the neurons in (pi fire further away from it. Therefore, kfj cannot represent an excitatory connection, 
but must represent inhibitory ones between ipi and ipj. In other words, we may give physical meaning 
to learning phase differences in equations ( 23 I by rewriting them to include excitatory and inhibitory 
terms. 

We should stress that inhibition and excitation have different meanings in different disciplines, and 
some comments must be made to avoid confusion. First, here we are using inhibition and excitation 
in a way analogous to how they are used in neural networks, in the sense that inhibitions decrease 
the probability of a neuron firing shortly after the inhibitory event, whereas excitation increases this 
probability. Similarly, the inhibitory couplings in equations (50) push oscillators apart, whereas the 
excitatory couplings bring them together. However, though inhibitory and excitatory synaptic cou- 
plings provide a reasonable model for the couplings between neural oscillators, we are not claiming to 
have derived equations (50) from such fundamental considerations. In fact, other mechanisms, such as 



asymmetries of the extracellular matrix or weak electric-field effects, could also yield similar results. 
The second point is to distinguish our use of inhibition from that in behavioral experiments. In animal 



experiments, inhibition is usually the suppression of a behavior by negative conditioning (Dickinson 
1980). As such, this concept has no direct analogue in the standard mathematical form of SR the- 
ory presented in Section [2] But we emphasize that our use of inhibition in equations ( 50 ) is not the 
behavioral one, but instead closer to the use of inhibition in synaptic couplings. 

Before we proceed, another issue needs to be addressed. In a preliminary analysis of the fixed points, 
we see three distinct possibilities: (f>ij = (j)2j = 0, (ii) (pij = tt, (f)2j — 0, and (iii) (f>ij = 0, (f)2j = tt. 
A detailed investigation of the different fixed points and their stability is given in [Appendix A[ For 
typical biological parameters and the physical ones we have added, the rate of convergence is such 
that the solutions are well approximated by the asymptotic fixed points by the time a response is 
made. But given that we have three different fixed points, how can ri be the conditioned response 
computation oscillator (fixed point iii) if r2 can also phase-lock in phase (fixed point ii)? To answer 
this question, we examine the stability of these fixed points, i.e., whether small changes in the initial 
conditions for the phases lead to the same phase differences. In Appendix A| we linearize for small 
fluctuations around each fixed point. As shown there (Equation [Appendix A. 81 ), a sufficient condition 
for fixed point (0, tt) to be stable is 
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and the other fixed points are unstable (see Fig. |6] below); similar results can be computed for the 
other fixed points. It is important to notice that, although the above result shows stability, it does 
not imply synchronization within some fixed finite time. Because neural oscillators need to operate in 
rather small amounts of time, numerical simulations are necessary to establish synchronization in an 
appropriately short amount of time (though some arguments are given in Appendix A. 4 with respect 
to the time of convergence and the values of the coupling constants). If the couplings are zero, the 
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Fi gure 6: Field plot, generated using Maple 10, for couplings fcs^^n — 
show that the fixed point at (0, tt) is stable, whereas (0,0), (tt, 0), (7r,7r), 
Thus, for a randomly selected initial condition, in a finite amount of time 
from r2- 



= —k and A:sj,r2 = k, k > 0. Field lines 
(tt/S, 27r/3), and (27r/3,7r/3) are unstable. 
Sj approaches the phase of r\ and departs 



model still picks one of the oscillators: the one with the minimum value for q in (48 1. But in this case 



because of the small fluctuations in the initial conditions, the probability for each response is 1/2, as 
expected, corresponding, obviously, to the SR parameter p = 1/2 in the one-stimulus model. 



4-.3. Parameter values 



We turn now to the parameters used in the oscillator models. In the above equations, we have as 
parameters N , wqi ^e, ck, At^, Aig, fc, (7^, ^, a^p, cq, Kq, ukq, and K' . This large number of parameters 
stands in sharp contrast to the small number needed for SR theory, which is abstract and therefore 
much simpler. In our view, any other detailed physical model of SR theory will face a similar problem. 
Experimental evidence constrains the range of values for loq, uj^, At^, and Aig, and throughout this 
paper we choose wo and to be at the order of 10 Hz, At^ = 200 ms, and Aig = 400 ms. These 
natural frequencies were chosen because: (i) they are in the lower range of frequencies measured for 
neural oscillators ( Freeman and Barrie[ |1994[ Friedrich et al. 2004 Kazantsev et al. 2004 Murthy 



and Fetz 1992 Suppes and Han 2000 Tallon-Baudry et al. 20011, and the lower the frequency, the 



longer the time it takes for two oscillators to synchronize, which imposes a lower bound on the time 
taken to make a response, (ii) data from [Suppes et al] ( |1997[ |1998| |1999b|a| ) ; [Suppes and Han| ( [2000[ ) 
suggest that most frequencies used for the brain representation of language are close to 10 Hz, and (iii) 
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choosing a small range of frequencies simplifies the oscillator behavior We use At^ — 200 ms, which 
is consistent with the part it plays in the behavioral response times in many psychological experiments; 
for extended analysis of the latter see Luce ( 1986 1. The value for At, = 400 ms is also consistent with 
psychological experiments. 

Before any conditioning, we should not expect the coupling strengths between oscillators to favor 
one response over the other, so we set fc = Hz. The synchronization of oscillators at the beginning of 
each trial implies that their phases are almost the same, so for convenience we use = 0. The standard 
deviations and are not directly measured, and we use for our simulations the reasonable values 
of (Tfc = IQ-'^ Hz and = tt/A Hz. Those values were chosen because ak should be very small, since 
it is related to neuronal connections before any conditioning, and should allow for a measurable 
phase reset at the presentation of a stimulus, so cr^ < 1. 

The values of a, eg, Kq, and ax^ are based on theoretical considerations. As we discussed in the 
previous paragraph, Kq is the mean strength for the coupling of the reinforcement oscillator, and aj^^ 
is the standard deviation of its distribution. We assume in our model that this strength is constant 
during a trial, but that it varies from trial to trial according to the given probability distribution. The 
parameter a is related to the maximum strength allowed for the coupling constants ks-^m ks^^r^i and 
kri^r2- Because of the stability conditions for the fixed points, shown above, any value of a leads to 
phase locking, given enough time. The magnitude of a is monotonically related to how fast the system 
phase locks. It needs to be at least of the order of 1/ Atr if the oscillators are to phase lock within the 
behavioral response latency. Another important parameter is cq, and we can see its role by fixing the 
phases and letting the system evolve. In this situation, the couplings converge exponentially to a with 
a characteristic time e^^, so eg determines how fast the couplings change. The time of response, Atr, 
given experimentally, sets a minimum value of 5 Hz for a, since a > At^^ — 5 Hz is needed to have 
phase locking from Kuramoto-type equations. The time of reinforcement. At,, and natural frequency 
ujQ are related to eg by > ^^<r^ =2.5 Hz and eo ujq. Finally, to have an effective reinforcement, 
Kq must satisfy Kq 3> wq. These theoretical considerations are consistent with the values a = 10 Hz, 
eo = 3 Hz, Kq — 4,000 Hz, and (Tko — 1,000 Hz. We note one distinction about the roles of the 
three probability distributions introduced. Samples are drawn of phases f and reinforcement oscillator 
coupling strength Kq on each trial, but oscillator couplings fc^ , . . . , kl^ , k^ are sampled only 
once at the beginning of a simulated experiment and then evolve according to (31|-(42l. 

Table [l] summarizes the fixed parameter values used in our simulations, independent of considering 



Table 1: Fixed values of parameters used to fit the oscillator models to SR experiments. 



Parameter 


Value 


Parameter 


Value 


a 


10 Hz 




7r/4 


Wo 


10 Hz 


k 


Hz 


We 


12 Hz 




10-3 Hz 


Atr 


200 ms 




1,000 Hz 


At, 


400 ms 


Kq 


4,000 Hz 







(a 


3 Hz 



the experimental design or data of a given SR experiment. This leaves us with two remaining free 
parameters to estimate from SR experimental data: the number of stimulus oscillators N and the 
nonlinear cutoff parameter K' . These two parameters have a straightforward relation to the SR ones. 
N corresponds to the number of stimuli, and K' is monotonically decreasing with the effectiveness of 
the reinforcement probability 9, as shown in (47 1. 



■^However, we should mention that none of the above references involve reinforcement, though we are working with 
the assumption that reinforcement events are also represented by frequencies that are within the range suggested by 
evidence. 
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Fig. [7] exemplifies the phase-difference behavior of three oscillators satisfying the Kuramoto equa- 
tions. In the figure, as shown, all oscillators start with similar phases, but at 200 ms both sj and ri 




Figure 7: MATLAB computation of the piiase differences between three coupled oscillators evolving according to Ku- 
ramoto's equations with couplings fcs^,ri = ^0 = — fesj,r2 = kri,r2 ^^'^ '^o = 207rs^^ (10 Hz). The solid line represents 
<l>isj = Vri - fsj , and the dashed (t)2sj = <Pr2 - <fis- 

lock with the same phase, whereas r2 locks with phase difference tt. Details on the theoretical relations 
between parameters can be found in [Appendix A[ 



5. Comparison with Experiments 

Although the oscillator models produce a mean learning curve that fits quite well the one predicted 
by SR theory, it is well known that stochastic models with underlying assumptions that are qualitatively 
quite different may predict the same mean learning curves. The asymptotic conditional probabilities, 
on the other hand, often have very different values, depending on the number N of sampled stimuli 
assumed. For instance, the observable conditional probability density j(x„|?/„_iy„_2) for the one- 
stimulus model is different from the conditional density for the A^-stimulus models, {N > 1) as is 
shown later in this section. 

In the following, we compare some behavioral experimental data to such theoretical quantities and 
to the oscillator-simulation data. We first compare empirical data from an experiment with a contin- 
uum of responses to the oscillator-simulation data. Second, we show that in a probability matching 
experiment (Suppes and Atkinson 1960 Chapter 10) the experimental and oscillator-simulated data 
are quite similar and fit rather well to the SR theoretical predications. Finally, we examine the paired- 
associate learning experiment of Bower (19611, and model it in the same way. Here we focus on the 
model predictions of stationarity and independence of responses prior to the last error. Again, the 
original experimental data, and the oscillator-simulation data exhibit these two properties to about 
the same reasonably satisfactory degree. 



5.1. Continuum of Responses 

As shown in Section [2] to fit the continuum-of-responses SR model to data, we need to estimate N, 
the number of stimuli, 9, the probability that a reinforcement is effective, and Ks{x\z), the response 
smearing distribution of each stimulus s over the set of possible responses. As we saw in Section 3, 



in the oscillator model 9 comes from the dynamics of learning (see equation ( 47 ) and corresponding 
discussion), whereas N is external to the model. The smearing distribution in the oscillator model 
comes from the stochastic nature of the initial conditions coupled with the nonlinear dynamics of 
Kuramoto's equations. 



Let us now focus on the experiment, described in Suppes et al. (19641. In it, a screen had a 5-foot 
diameter circle. A noncontingent reinforcement was given on each trial by a dot of red light on the 
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Figure 8: On the left histogram of the reinforcement angles used in the oscillator simulation, and on the right both the 
SR theoretical density of the one-stimulus model and the histogram of the oscillator-simulated responses. 



circumference of the 5-foot circle, with the position determined by a bimodal distribution /(y), given 

by 



/(y) 



< x < f 

f < X < TT 



(2^ - y) , 



3tt_ 
2 



TT < a; < 
^ < x < 27r. 



(51) 



At the center of this circle, a knob connected to an invisible shaft allowed the red dot to be projected 
on the screen at any point on the circle. Participants were instructed to use this knob to predict on 
each trial the position of the next reinforcement light. The participants were 4 male and 26 female 



Stanford undergraduates. For this experiment, Suppes et al. (19641 discuss many points related to the 



predictions of SR theory, including goodness of fit. Here we focus mainly on the relationship between 
our oscillator-simulated data and the SR predicted conditional probability densities. 

Our main results are shown in Figures [8f |ll| The histograms were computed using the last 400 
trials of the 600 trial simulated data for each of the 30 participants, for a total of 12,000 sample points, 
and b was reparametrized to correspond to the interval (0, 27r). The parameters used are the same as 
those described at the end of Section |4j with the exception of Kq — 4468, chosen by using equation 
(47 1 such that the learning effectiveness of the oscillator model would coincide with the observed 
value of 6* = 0.32 in Suppes et al. ( |1964[ ). Each oscillator's natural frequency was randomly selected 
according to a Gaussian distribution centered on 10 Hz, with variance 1 Hz. Figure [8] shows the effects 
of the smearing distributions on the oscillator-simulated responses. Figure |9] shows the predicted 
conditional response density j{xn\yn-i) if reinforcement on the previous trial (n — 1) happened in the 
interval (tt, 37r/2). As predicted by SR theory, the oscillator-simulated data also exhibit an asymmetric 
histogram matching the predicted conditional density. We computed from the simulated data the 
histograms of the conditional distributions j(x„\yn-iyn-2), shown in Figure 10 Finally, if we use our 
simulated data to generate histograms corresponding to the conditional densities j(a;„|t/„_ia;„_i), we 



obtain a similar result to that of the one-stimulus model (see Figure 111. We emphasize that, as in 
the iV-stimulus oscillator model used for the paired-associate learning experiment, the continuum-of- 
response oscillator model can be modified to yield data similar to the statistical predictions of the SR 
model, if we add oscillators to represent additional stimuli. 

Up to now we showed how the oscillator-simulated data compare to the experiments in a tangential 
way, by fitting them to the predictions of the SR theory, which we know fit the behavioral data rather 
well. Now we examine how our model performed when we compare directly oscillator-simulated and 
the behavioral empirical data. For the continuum of responses, we focus on the data presented in 
Figures 2 and 5 of Suppes et al. (19641, corresponding to our Figures |8] and |9] As the original data are 
not available anymore, to make a comparison to our simulations we normalized the simulated-data his- 
tograms and redrew them with the same number of bins as used in the corresponding figures of [Suppes] 



24 



150 




Figure 9: Histogram for the oscillator-simulated response x„ on trial n conditioned to a reinforcement on trial n — 1 in 
the interval (tt, 3it/2). The black line shows the fitted SR theoretical predictions of the one-stimulus model. 



et al. (1964). We then performed a two-sample Kolmogorov-Smirnov test to compare the empirical 



behavioral data with the oscillator-simulated data, the null hypothesis being that the two histograms 
originated from the same distribution (Keeping, ,1995j . For the simulated response histogram of Figure 
[8| the Kolmogorov-Smirnov test yields a p-value of 0.53. So, we cannot reject the null hypothesis, that 
the simulated data come from the same distribution as the corresponding behavioral data. For the 
asymmetric conditional probability of Figure |9] the Kolmogorov-Smirnov test yields a p-value of 0.36, 
once again suggesting that the oscillator simulated data are statistically indistinguishable from the 
empirical data. 



5.2. Experiment on probability matching 



The second experiment that we modeled with oscillators is a probability-matching one, described 
in detail in Suppes and Atkinson (1960 Chapter 10). The experiment consisted of 30 participants, 
who had to choose between two possible behavioral responses, i?i or R2. A top light on a panel would 
turn on to indicate to the participant the beginning of a trial. Reinforcement was indicated by a light 
going on over the correct response keyj^ The correct response was noncontingent, with probability 
P = 0.6 for reinforcing response Ri and 0.4 for R2. The experiment consisted of 240 trials for each 
participant. 

To compare the oscillator-simulated data with the results for the probability-matching experiment, 
we first computed, using the behavioral data for the last 100 trials, the log pseudomaximum likelihood 
function (Suppes and Atkinson 1960 p. 206) 



L{e,N)^ ^ n,,-fclogPoo(i?fe ,n-f 1 ,n ) t 

1 

where Uij^k is the observed number of transitions from Ri and Ej on trial n to Rk on trial 71 + 1, and 
the SR theoretical conditional probabilities Poo {Rk,n+i\Ej^nRi,n) are: 



^The widely used notation, Ri for response i and Ej for reinforcement j, for experiments with just two responses is 
followed in describing the second and third experiments. 



25 




Figure 10: Histograms of oscillators-simulated responses conditioned to reinforcement on the two previous trial. All 
graphs correspond to simulated responses with reinforcement on trial n—1 being in the interval (tt, 37r/2). S corresponds 
to reinforcements on trial n — 2 occurring in the same interval. L corresponds to reinforcement on n — 2 occurring on the 
interval {3tt/2,2tt), A on the interval (0, 7r/2), and P on (7r/2,7r). The solid lines show the SR theoretical one-stimulus 
model predictions. 
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Figure 11: Histograms of oscillators-simulated responses conditioned to reinforcements and simulated responses on the 
previous trial. All graphs correspond to simulated responses with reinforcement on trial n — 1 being in the interval 
(tt, 3n/2). S corresponds to responses on trial n — 1 occurring in the same interval. L corresponds to responses on n — 1 
occurring in the interval {3n/2,2n), A in the interval (0, 7r/2), and P in the interval (7r/2,7r). The solid lines are the 
predictions of the one-stimulus SR model. 
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(52) 
(53) 
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The log likelihood function L{9,N), with (3 = 0.6 as in the experiment, had maxima at 6* = 0.600 
ioi N = 3, 9 = 0.631 for = 4, and 6 — 0.611 for N = 3.35, where N is the log pseudomaximmn 
likelihood estimate of N (for more details, see Suppes and Atkinson (1960 Chapter 10)). We ran 
MATLAB oscillator simulations with the same parameters. We computed 240 trials and 30 sets 
of oscillators, one for each participant, for the three-, and four-stimulus oscillator models. Since the 
behavioral learning parameter 6 relates to K' , we used K' — 65 s^^ and K' — 56 s^^ for the three-, and 
four-stimulus models, respectively, corresponding to ^ = 0.600, and 6 — 0.631. Table [2] compares the 
experimentally observed conditional relative frequencies with the predicted asymptotic values for the 
N =3 or 4 SR models, and the corresponding oscillator simulations. In Table 2] P {Ri^n+i\Ei,nRi,n) 
is the asymptotic conditional probability of response i?i on trial n + 1 if on trial n the response was Ri 
and the reinforcement was Ei, and similarly for the other three expressions, for each of the conditions 
in Table 2, observed frequency, two oscillator simulations and two behavioral predictions. 



Table 2: Comparison between experimental results and the corresponding oscillator models. Theoretical results for the 
A'^-stimulus model are shown for comparison purposes. The first data row shows experimental values based on the last 
100 trials of the 30-subject group. The other rows show the SR theory asymptotic probabilities and simulations for 
the oscillator models averaged over the last 100 trials. The columns show the asymptotic conditional probabilities. For 
example, column _Ri|i?i_Ri shows the asymptotic probability P{Ri\EiRi) of response iJi on trial n given that on trial 
n — 1 the response was Ri and the reinforcement was Ei . 





Ri\EiRi 


i?l £'li?2 




Ri\E2R2 


Observed 


.715 


.602 


.535 


.413 


Oscillator 




3-stimulus 
simulation 


.705 


.574 


.522 


.381 


4-stimulus 
simulation 


.760 


.610 


.538 


.426 


SR-theory 




3-stimulus 
prediction 


.733 


.600 


.533 


.400 


4-stimulus 
prediction 


.700 


.608 


.542 


.450 



In Table |2] we compare the oscillator simulations to the behavioral data and the best fitting asymp- 
totic SR models. Each oscillator model was computed using the same estimated 9 value as the corre- 
sponding SR model. The fit of the SR model to the empirical data is quite good, as is the fit for the 
4-stimulus oscillator model. 



5.S. Experiment on paired-associate learning 



In this experiment, described in detail in Bower (1961), 29 participants learned a list of ten stim- 
ulus items to a criterion of two consecutive errorless cycles. The order of stimulus presentation was 
randomized for each trial. The visual stimuli were different pairs of consonant letters; the numerical 
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responses were the numbers 1 or 2, each number being assigned as correct to a randomly selected five 
stimuli for each participant. A participant was informed of the correct answer following each response. 
The one-stimulus model fitted the behavioral data the best. 

To model with oscillators, we used 29 independent sets of oscillators, each set with ten-stimulus, 
two-response, and two reinforcements, to reproduce Bower's setup. For our oscillator simulations we 



set = 1 and K' — 94 s^ , Kq = 90 Hz, and (Jko = 10 Hz . K' was chosen based on (47) and on 
Bower's statistical estimation of the probability of effective reinforcement for the one-stimulus model, 
which was = 0.344. 



Bower's behavioral data were also tested in Suppes and Ginsberg ( 1963 1 for the statistical properties 



of stationarity and independence prior to the last response error, a prediction of the one-stimulus SR 
model. We compare in Table |3] the behavioral results with those for our oscillator simulation. 

Stationarity. The first test is that of stationarity, i.e., whether the probability of a response 
oscillator phase- locking before the last error is constant. To perform this test, we restricted our 
oscillator-simulated data set to the M responses that happened before the last behavioral error of 
a given participant. We then divided this set into two, with the first M/2 trials being early trials 
and the remaining M/2 late trials. Let ni be the number of correct responses in the first half and 
7T.2 the number of correct responses in the second half. If the probability is stationary, then both 
nx/{M /2) and n2/(M/2) should be approximately (ni -|- ni)IM. We used a standard test for this 
null hypothesis of stationarity for the oscillator-simulated responses. 

Independence. Restricting ourselves again to trials prior to the last behavioral error, let be 
the number of transitions from state i to state j, where i and j can take values (correct response) or 
1 (incorrect). We use these numbers to estimate the transition probabilities p^. The null hypothesis 
of independence is that poo = Pio E^nd poi = Pii- The sample paths used to compute the oscillator 



Table 3: Comparison between the paired-associate experimental data and the simulated one-stimulus oscillator data. 
Notice that N is different for the experiment and the simulation, as is the number of responses prior to the last error, 
which varies for each run of t he simulations. 





Stationarity 


Experiment 


X"- = -97, N = 549, 1 
df = 6,p> .95 


Oscillator simulation 


= 2.69, TV = 748, 
d/ = 6, p > .8 




Independence 


Experiment 


X'^ = .97, N = 549, 
df = 6,p> .95 


Oscillator simulation 


= 0.1, N = 458, 
df = 6,p> .95 



results in Table [sjwere obtained by running simulations in MATLAB 7.1.0.183 (R14), SP 3 with the 
parameter values given earlier. The simulations consisted of randomly selecting, at each trial, the 
initial phase according to ( 22 1 and then computing phase evolution during Atr by numerically solving 
([24])-([56]) using MATLAB's built-in fourth-fifth order Runge-Kutta method. After the phase 
differences were computed and a response oscillator selected. At the beginning of reinforcement, new 
initial conditions and a random value for Kq were drawn according to (22) and (27), respectively. If 
Kq > K' , then the oscillators' couplings changed according to (28l-(42), and these equations were 
numerically solved for the time interval Ate also using fourth-fifth order Runge-Kutta, and the new 
values for the couplings were used on the next trial. 

The results in Table |3] show that we cannot sharply differentiate the behavioral and the oscillator- 
simulation data in testing the null hypotheses of stationarity and independence. 
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6. Conclusions and Final Remarks 



We modeled stimulus-response theory using neural oscillators based on reasonable assumptions from 
neurophysiology. The effects of the interaction between the neural oscillators were described by phase 
differences obeying Kuramoto's equations. Learning was modeled by changes in phases and couplings, 
driven by a reinforcement oscillator, with phase locking corresponding to behavioral conditioning. We 
compared the oscillator-simulated data to the data from three behavioral experiments, as well as to the 
corresponding behavioral-model predictions derived from SR theory. Our simulation results support 
the claim that neural oscillators may be used to model stimulus-response theory, as well as behavioral 
experiments testing the theory. 

From (47 1, behavioral and oscillator-simulated data relate the SR conditioning parameter 9 to K' , 
(7ko, and Kq. Despite the large number of parameters, our numerical oscillator simulations, all done 
with the same set of parameters, except N and K' , show that the statistical fit reported in Table [s] 
and conditional probabilities in Table [2] do not vary much even when these parameters are changed 
significantly. The most important constraint is this. Once all parameters are fixed then the stimulus- 
response conditioning parameter is a nonlinear monotonically decreasing function of K' given by 
(47 1. As our oscillator-simulations are refined and extended to more experiments, further constraints 
should narrow the parameter space. 

As is sometimes remarked, stimulus-response theory abstracts from many necessary processes that 
must have a physical realization. From a general psychological viewpoint, independent of the details of 
physical realization, undoubtedly the most obvious missing process is the perceptual recognition of the 
sampling of the same, or nearly the same stimulus, and of the reinforcement stimulus patterns different 
on repeated trials. We should be able to expand the present setup to include oscillators that recognize, 
by similarity or congruence computations, when a sampled stimulus type is one that has been previously 
sampled. Prior work on pattern recognition will be useful here. Our own earlier experimental work 
on computational models that recognize auditory or visual word, sentence or other brain images, such 
as those of red triangles or blue circles, provides substantial evidence that a single oscillator with 
one given natural frequency will be inadequate. Multiple oscillators with different natural frequencies 
will be required for such similarity recognition. For examples of such computational models applied 

Suppes et"ar] (|1997[ |1998[ |1999a|b( ) for Fourier methods, |de Barros 



to EEG-recorded brain data, see 



et al. ( 2006 ) for a Laplacian model, 



Suppes et al. ( 2009 ) for congruence between brain and perceptual 



( 2006 1 for perceptron models with regularization and independent 



feature of language, and Wong et al. 
component analysis. 

The oscillator computations developed in this paper provide a schematic model of psychological 
processes successfully described behaviorally using stimulus-response theory. The coupled neural phase 
oscillators used offer a physical mechanism to explain schematically how conditioning works in the 
brain. But many important physical details of such processes remain to be clarified. The oscillator- 
simulations do suggest many physically and biologically relevant measurements that are not part of 
stimulus-response theory. For example, although we used the simplifying assumption that a response 
occurs at a time Atj. after the onset of the stimulus, a more detailed model should derive a response-time 
distribution variance, with mean and variance at least being derived from biological assumptions about 
various physical processes, such as firing thresholds being reached due to oscillator synchronization. 
Another feature that could be further explored is the interference between oscillators. As we saw in 
Section [4] neural oscillators may interfere in ways similar to two electrical oscillators or two wave 
sources. This interference may lead to predictions distinct from those of classical SR theory, when 
modeling complex processes. Such predictions could, perhaps, be closer to behavioral models suggested 

,2006 ; Bruza et al. 2009| ) for possible alternatives) . 



by very different approaches (see (Busemeyer et al. 



Finally, we emphasize that neural oscillators may produce, in principle, measurable electrophysi- 
ological signals. Prior to having such measurements available as psychology and system neuroscience 
draw closer together, it is desirable that specific, even if only schematic, physical mechanisms be pro- 
posed to provide conceptually plausible physical realizations of fundamental psychological processes. 
The most important idea tested here is that the main physical mechanism in stimulus-response learning 
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is phase-locking, a concept widely used in physics, but not in psychology, or, even as yet, not much in 
system neuroscience. 



Appendix A. Properties of the Oscillator Model 



In this appendix we derive many important properties of the oscillator model. In Section |Ap-| 
Ipendix A.l[ we show that the convergence values for the reinforced couplings lead to some fixed points 
for equations (24|-(26) that correspond to the desired phase differences. Section Appendix A. 2 exam- 



ines the different fixed points, and verifies that the couplings converge to couplings such that the only 
stable fixed points are those corresponding to the reinforced phase difference 5if. Section pVppendix A.3| 
extends the results of Section pVppendix A.2| to the case of different coupling strengths, and establishes 
a more general condition for the stability of the fixed point corresponding to 5lp based on the relative 
signs of the couplings. This result is relevant, as the learning equations do not guarantee the exact 
values of couplings, but they can guarantee, when learning is effective, that couplings do satisfy the 
conditions presented in Section [Appendix A.3[ In Section [Appendix A.4|w e analyze the relationship 
between coupling strength and the convergence time of equations (24|-(26l toward a fixed point. Sec- 
tion |Xppendi^^3] investigates the behavior of the oscillators during reinforcement. Finally, in Section 



[Appendix A. 6 we show how some of the model parameters relate to the effectiveness of reinforcement 
9, deriving equation (47 1. 



Appendix A.l. Fixed points for coding phase relations 



Here we derive the asymptotic couplings for the reinforcement angles coding the phase relation S^p 
detailed in Section [4] and then show that such couplings lead to a set of differential equations with fixed 
points that are stable around solutions corresponding to the desired phase differences ( 18 1. First, let us 
start with the reinforcement schedule set such that </?s = Wgi, ^Pr-^ = + ^iid (p^.^ = Lo^t + 5lp — -k. 



From (28l-(30) we obtain the following fixed points for excitatory connections, which work as an 



attractor for the asymptotic behavior (for more details, see Appendix A. 5 1 
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(Appendix A. 6) 



Similarly, for inhibitory connections we have. 
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0, 

a sin (Sp) , 
—a sin {Sip) , 
0. 



(Appendix A. 7) 
(Appendix A. 8) 
(Appendix A. 9) 
(Appendix A. 10) 
(Appendix A. 11) 
(Appendix A. 12) 
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Assuming that a sufficient time interval elapsed such that the coupling coefficients converged to (Ap- 



pendix A. f (-(Appendix A.f2), Kuramoto's equations become 



= loq- a cos [Sip) sin {(ps - v?,-! ) 

+a cos {5(f) sin (ips — ) 
+Q; sin (Sip) cos {ips — 'Pn ) 
—a sin {5p}) cos {ips — 'Pr2 ) i 

= loq — a cos sin {ipr-^ ~ fs) 

+a sin {(pn — 'Pr2 ) 
—a sin ((5</?) cos ((^^ — Vs) i 

- loq + acos{dp)sm{pr2 ~ Vs) 
+a sin {pr2 - Vri ) 
+asin (Sp) cos ((p^a ~ '^s) ■ 

Regrouping the terms with the same phase oscillators, we have 
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ft is straightforward to verify that equations (Appendix A.f 3 (-(Appendix A.f 5 1 have the desired phase 



relations as fixed points. In the next section, we will show that not only are those fixed points stable, 
but that they are the only stable points modulo a 27r transformation (which in our case is not relevant) . 

Appendix A. 2. Stability of Fixed Points 



In [Appendix A.l we showed that a specific choice of couplings in the Kuramoto equations leads to 
a specific phase relation between oscillators. Here we derive the stability conditions. We assume that 
the couplings are such as required by phase differences ps = Pn ^ ^'f E^nd p^ 



TT, as coded 



in couplings (Appendix A.I (-(Appendix A. 12 1. To simplify our approach, we start with equations 
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(Appendix A. 13 (-(Appendix A. 15) and make the following change of variables: 



4)s = <y5s + 5lf, 
02 = Vr2 + TT. 



Equations (Appendix A. 13 (-(Appendix A. 15 1 then become 



dt 

d(f>i 
dt 



U(P2 

dt 



LUo - asm [(j)s ~ 01 j 

—a sin {(ps — 4>2) , (Appendix A. 16) 

uq - asm (01 - 0s) 

—a sin (01 — 02) , (Appendix A. 17) 

uq - asm (02 - 0s) 

— a sin (02 — 0i) . (Appendix A. 18) 



This change of variables allow us to re-write the dynamical equations without any references to the 
specific phase differences, and therefore we can obtain general results for any phase differences consis- 
tent with the desired relations for conditioning. Additionally, since we are mainly interested in phase 
differences, we make yet another change of variables, further simplifying our equations, by defining 



Substituting in (Appendix A. 16 (-(Appendix A. 18 1, we reduce the system to the following pair of 



coupled differential equations. 



= —2q sin (01s) — a sin (02s) (Appendix A. 19) 



dt 

-a sin (01s - 02s) 

i02s 



= — a sin (01s) — 2a sin (02s) (Appendix A. 20) 

+a sin (01s - 02s) ■ 

Because we are interested in general properties of stability, and we want to understand what conditions 
lead to the stable desired fixed points, it is convenient to not restrict our couplings to the ones imposed 
by a specific phase relation. Instead, we rewrite the above differential equations in terms of couplings 
that are not necessarily all the same, i.e.. 



= -2ks^ri sin (01s) - fcs,r2 sin (02s) (Appendix A. 21) 

-fcri,r2 sin (01s - 02s) , 

^02s 



fcs.ri sin (01s) — 2fcs,r2 sin (02s) (Appendix A. 22) 



dt 

+ kr^,r2 sin (01s - 02, 
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The only constraint we will make later on in our analysis is that such couplings need to have the 
same absolute value, a requirement consistent with asymptotic behavior of the learning equations (see 
Appendix A.5I. Right now, the question we want to answer is the following. Are there stable solutions 
to (Appendix A. 21 1 and (Appendix A. 22 1? If so, how do they depend on the couplings? 



To answer the above questions, we should first compute whether there are any fixed points for 



difi^erential equations (Appendix A. 21 1 and (Appendix A. 22). Since a fixed point is a point where the 
system remains if initially placed in it, they are characterized by having both time derivatives of (pij 
and (j)2j equal to zero. Thus, the fixed points are given by 



- 2fcs,ri sin (01s) - ks,r2 sin (02s) 



hs - 



-ks,ri sin {4>is) - 2ks.r2 sin ( 



hs) 

kri,r2 sin (01s - 02s) 



(Appendix A. 23) 
(Appendix A. 24) 



Because of the periodicity of the sine function, there are infinitely many fixed points. Since we are 
interested in phases, we do not need to examine all points, but it suffices to focus on the region 
determined by < 0is < 2tt and < 02s < 27r. The trivial fixed points, corresponding to each sine 
function in (Appendix A. 23 1 and (Appendix A. 24 1 being zero, are (0,0), (tt, 0), (0,7r), and (tTjTt). 
The other solutions depend on the values of the coupling constants. To simplify our analysis, let 
us consider the case where all couplings have the same magnitude, i.e. |A:s^ri| = I^ri,r2l = I^s,r2l- 
Since the relative coupling strengths are relevant for matters of stability, let us consider the following 



scenarios: 



ka 



k — ^s,ri — ^s,r2 



— h h = — 

Other cases, like 



»'s,ri 

ka' 



k 



s,r2 ■ 



h h 



— ks,ri — ^s,r2 — ^ri,r2i 

are contained in those, since k 



can be either positive or negative. Let us examine each case separately. 



AppGTidtx A.2.1. k — kg^j-^ — ^s,t2 — ^T\,r2 

For this case, in addition to {(0, 0), (0, tt), (tt, 0), (tt, tt)}, we have (|7r, — |7r) and (— |7r, |7r) as fixed 
points. Let us start with {(0,0), (0, tt), (7r,0), (tTjTt)}. In order to analyze their local stability, we lin- 



earize Kuramoto's equations (Appendix A. 21 ) and (Appendix A. 22 1 around {(0, 0), (0, tt), (tt, 0), (tt, tt)}. 
To linearize the equations, first we substitute 01^ and 02s by a + ei and b + 62, where a and b are the 
coordinates of the fixed points. Then we make a linear approximation for the sine function for values 
close to the fixed point, i.e. for ei and €2 being small. Below we show the linearized equations for 
different fixed points {a,b). 

For example, for (0, tt) we define 01^ — ei and 02s = tt 
pendix A. 22 1 become 



£2- Then (Appendix A.21I and (Ap- 



dei 

de2 
Itt 



—2k sin (ei) — k sin (tt + £2) 
-fcsin (ei - £2 - tt) , 
— fc sin (ei) — 2k sin (tt + 62) 
-l-fcsin (ei - £2 - tt) . 



(Appendix A. 25) 
(Appendix A. 26) 



Because close to the fixed points ei ^ 1 and e2 ^ 1, we make the approximation that sin(ei) « ei and 
sin(e2) ~ 62 , and (Appendix A. 25 1 and (Appendix A. 26 1 can be written as 



lit 
lit 



= — 2fcei + ke2 + k {ei — €2) ■ 
= —kei + 2ke2 — k (ei — €2) 



(Appendix A. 27) 
(Appendix A. 28) 
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or 



dei 
lit 

lit 



= —2kei + 3fce2. 



(Appendix A. 29) 
(Appendix A. 30) 



The eigenvalues for the hnearized (Appendix A. 29 1 and (Appendix A. 30 1 are —k and 3k. A fixed 
point is Liapunov stable if its eigenvalues are negative ( Guckenheimer and Holmes 1983). Therefore, 
for k = ks^n = kg.rz = kri,r2j the fixed point (0,7r) cannot be stable. 

We perform the same computations for the other fixed points. The linearized equations are 



dei 

lit 
de2 
lit 



for (0,0), 



once again, for (0, tt) 



for (7r,0), and 



de\ 

lit 
dt2 
lit 



dei 

m 

de2 
lit 



dei 
lit 
de2 
lit 



= — 3fcei, 
= -3ke2, 

-kei, 

—2kei + 3fce2, 

3/cei — 2ke2, 
-ke2, 

kei + 2fce2, 
2kei + fce2, 



(Appendix A. 31) 
(Appendix A. 32) 

(Appendix A. 33) 
(Appendix A. 34) 

(Appendix A. 35) 
(Appendix A. 36) 

(Appendix A. 37) 
(Appendix A. 38) 



for (tTjTt). Only (Appendix A. 31 1 and (Appendix A. 32 1 have negative eigenvalues, when fc > 0, thus 
corresponding to stable fixed points. 

Now let us examin e fixed points (I tt, — I tt) and (— ^tt, |7 r). Substituting ipis — ±|7r + ei {t) and 
02s — -f^T^ + £2 (t) in (Appendix A. 21 1 and (Appendix A. 22), we have 



1 dei 
kltt 



lde2 
k dt 



-2 sin I ig''' + ei 



sin ( =Fg7r + £2 



2 2 
- sin ( ±-7r + ei (t) ± -tt - 62 



sin ( ±-7r + ei ) — 2 sin ( T^"" + ^2 



,2 , ^ 2 

-sm ( ±-7r + ei [t) ± -tt - 62 



(Appendix A. 39) 



(Appendix A. 40) 
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Using the linear approximation for ei <C 1 and £2 <C 1 we get 



= T2sm^-.j-2cos^-7r)., 

±sin ( ^TT^ — cos f ^TT ) €2 

4 



— cos ( ^TT ) (ei — £2) , (Appendix A. 41) 



lde2 . /2 \ /2 



fcdt Tsm(^-.J-cos(^-Mei 

±2 sin ( -TT I — 2 cos | -tt ) £2 

±sin ( -TT 
3 

4 

3' 



+ C0S ( -TT ) (ei — £2) . (Appendix A. 42) 



or 



1 dci 3 /» ,. » <„x 

fc^ = 2^1' (Appendix A.43) 

1 c?e2 3 , . . , ,x 

" 2^^' (Appendix A.44) 

This point is stable if fc < 0. Thus, for k = ks^n = ks,r2 = Ki,r2i there are three stable fixed points: 
(0,0), (|7r, — Itt), and (— |7r, |7r). However, if fc > the only fixed point that is stable is (0,0), 
corresponding to the situation when all oscillators are exactly in phase. 

AppGTldtX A. 2. 2. Jv ^= kg ^ ^s.T2 — ^r\.r2 

Now the fixed points are (0,0), (0,7r), (7r,0), (7r,7r), (|7r, ^tt), and (— |7r,— ^tt). The linearized 
equations are 

= fcei, (Appendix A. 45) 

= 2fcei - 3fce2, (Appendix A.46) 



for (0,0), 



~dt 
de2 
~dt 



for (0,7r), 



dei 
lit 
de2 



= 3fcei, (Appendix A.47) 

= 3fce2, (Appendix A.48) 

-fcei - 2fce2, (Appendix A.49) 

— 2fcei — fce2, (Appendix A. 50) 
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for (tt, 0), and 



Itt 



= — 3fcei + 2ke2, 



(Appendix A. 51) 
(Appendix A. 52) 



for (tt, tt). Only (Appendix A.47I and (Appendix A.48I correspond to fixed points that can be stable, 
if fc < 0, as there are no other choices of k that would allow the other points to be stable. 

For fixed points (|7r, and (— ^Tr, — j^7r), we substitute — ±|7r + ei {t) and (/)2s = ±|7i' + e2 {t) 
in (Appendix A. 21) and (Appendix A. 22 1, and we have 



dei 
lit 



d£2 

~dt 



2fcsin I ±-7r + ei 



k sin ±-7r 
3 



£2 



, • , , 2 1 

-fcsm I ±-7r T gTT + ei - £2 



k sin I ig"" + £i 



- 2fcsin I ±-7r + e2 



, ■ 1 

+A;sin I igTT ip -TT + ei - £2 

Using the linear approximation for ei ^ 1 and e2 <C 1 we get 

1 dei 3 



Idea 
fc di 



:e2- 



(Appendix A. 53) 

(Appendix A. 54) 

(Appendix A. 55) 
(Appendix A. 56) 



Points (|7r, ^tt) and (— §7r, — ^tt) are stable if fc > 0. Thus, for fc = — fc^^n = ^s,r 
three relevant stable fixed points: (0,7r), (|7r, ^tt) and (— |7r, — ^tt). However, if fc < the only fixed 
point that is stable is (0, tt), corresponding to the situation when the oscillators s and ri are in phase 
and s and r2 are off phase by tt. 

ji.J)pGTldlX A, 2. 3. fc ^^i: kg j'^ ^ fcg ^~ 1^2 



For the points {(0, 0), (0, vr), (tt, O), (tt, tt)} the linearized form of (Appendix A. 55 ) and (Appendix A. 56 1 
around the stability points are 



for (0,0), 



dei 
~dt 
de2 
~dt 



dei 
~dt 
de2 
~dt 



for (0,^), 



dei 

lu 

de2 
Itt 



— 3fcei + 2fce2, 
fce2, 

-fcei - 2fce2, 
— 2fcei — fce2, 

= 3fcei, 
= 3fce2, 



(Appendix A. 57) 
(Appendix A. 58) 

(Appendix A. 59) 
(Appendix A. 60) 

(Appendix A. 61) 
(Appendix A. 62) 
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for (tt, 0), and 



dei 
lit 

lit 



kci, 

2kei — 3fce2, 



(Appendix A. 63) 
(Appendix A. 64) 



for (tt, tt). Only (Appendix A. 61 1 and (Appendix A. 62 1 correspond to fixed points that can be stable, 
if we choose k < 0. There are no choices of k that would allow the other points to be stable. 



The other stable points are (^tt, |7r) and {—Itt,—^tt). The linearized (Appendix A. 21 1 and (Ap- 



pendix A. 22 1 become 



1 dei 
k~dt 



lde2 
k dt 



-2 sin I ig"" + ei 



sin [ ±2"" + ^2 



, 1 2 

sm ( ±-7r + ei ;p -tt - 63 



. , 1 

— sm I i g'^T 



2 sin I ±2"" + £2 



, 1 2 

sm ( ±-7r + ei T - 62 



Using the linear approximation for ei ^ 1 and 62 <C 1 we get 



(Appendix A. 65) 



(Appendix A. 66) 



1 dei 


3 


k~dt 




lde2 


3 


k dt 




= h 


— —h — 

— "'S,r2 — 



This point is stable if fc > 0. Thus, for k = k^.n — — fcs.rg — ^ri,r2 there are three relevant stable fixed 
points: (7r,0), (^tt, |7r), and (— ^tt, —^tt), but if fc < only the point (7r,0) is stable, corresponding to 
oscillator r2 in phase with oscillator s, and ri off phase with s. 

.AppGTidzx A..^.J^. k ^= kg ip-y^ — kg — k^-^ 



For the points {(0,0), (0, tt), (7r,0), (tTjTt)}, linearizing (Appendix A. 55 1 and (Appendix A. 56) gives 



us 



d€2 

Itt 



for (0,0), 



for (0,^), 



Itt 
de2 
Itt 



dei 
lit 
de2 
lit 



= — 3fcei, 
= ke2j 

— 3fcei + 2fce2, 
ke2, 

fcei, 

2fcei — 3fce2, 



(Appendix A. 67) 
(Appendix A. 68) 

(Appendix A. 69) 
(Appendix A. 70) 

(Appendix A. 71) 
(Appendix A. 72) 
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^s,ri ^ 


ks,ri ^ 




(0 0) 


(-§7r, 37r) 






(0,7r) 


^s,ri — ^s,r2 — ^ri,r2 


(-^tt.-Itt) 


(^,0) 




(-iTT, |7r) 
(i7r,-37r) 


(7r,7r) 



Table A. 4: Stable fixed points for |A;s,ril = |fcs,r2l = I'i^ri,r2l- Since the absolute values of the coupling strengths are all 
the same, which fixed points are stable is determined by the relative signs of the couplings. 



for (tt, 0), and 



dei 
lit 
de2 

Itt 



3A;e2, 



(Appendix A. 73) 
(Appendix A. 74) 



for (tTjTt). Once again, only (Appendix A. 73 1 and (Appendix A. 74 1 correspond to fixed points that 
can be stable, if we choose fc < 0. 

The other fi xed points are (— ^tt, \ ti) and {\n ,^\'k ). Substituting (pis = tF + ei (^) E^nd 02s = 
±i7r + £2 {t) iu (Appendix A. 21 1 and (Appendix A. 22 1, after a linear approximation for ei ^ 1 and 
62 ^ 1, we obtain 



1 dei 
kltt 
Idea 
k dt 



:£2- 



(Appendix A. 75) 
(Appendix A. 76) 



This point is stable if A: > 0. For k 



(7i",7r), 



ks,ri = ^s,r2 = —fcri,r2 there are three relevant stable fixed points: 
(|7r,— Itt), and (— |7r, |7r)|^ However, if fc < the only fixed point that is stable is (tTjTt), 
corresponding to the situation when both oscillators ri and r2 are off phase with respect to oscillator 
s by TT. 

<0, 



The results of Sections Appendix A. 2.1 Appendix A. 2. 4 are summarized in Table A. 4 If k 



the relative signs of the other coupling constants determine unique stability points, corresponding to 
oscillator ri in phase with oscillator s and r2 off phase by tt (row 2) , r2 in phase and ri off phase by tt 
(row 3), and both ri and r2 off phase by tt (row 4). The only possibility for all oscillators ri, r2, and 
s to be in phase is when all couplings are positive. 



Appendix A. 3. General sufficient criteria 

In the section above, we showed how the fixed points behaved for couplings with the same strength 
but different signs. But, since the learning equations (28l-(42) and the initial conditions cannot 
guarantee couplings with the same strength, we need to show that a fixed point is stable under even 
if the couplings have different strengths. Here, we show a more general condition for stability. Let us 



^In fact we have, as discussed above, an infinite number of stable fixed points. But since we are only interested in 
phase differences, and the fixed points other than (|7r, and (— |7r, ^tt) are periodic, with periodicity 2tt, we omit 

them from our discussion. 
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take, for instance, the point (0,7r). Close to this point. Equations (Appendix A. 21 (-(Appendix A. 22), 
after hnearization, become 



dei 
~dt 

de2 
lit 



+ kri,r2 (ei - £2) , 



2k 



Equations (Appendix A. 77 1 and (Appendix A. 78 1 have eigenvalues 
and 



where 



A2 = 



D ^ k 



2 



r-i,r2 



D, 



(Appendix A. 77) 



(Appendix A. 78) 



(Appendix A. 79) 



(Appendix A. 80) 



k 



3,7-2 "'^1 ,r2 



The stability point (0,7r) is stable if the eigenvalues Ai and A2 are negative. Since A2 < implies 
Ai < 0, we focus on Equation (Appendix A. 80 1. As long as the couplings are real, we can show that 
D > 0. Thus, the requirement that in (Appendix A. 80 I, A2 is negative is equivalent to 

(^s,ri ^s,r2 ^ri,r^ 



> 



2 + 
2 

ri,r2 



k 

+k 
-k 



h h 

'^s,ri "Ti ,r2 



s,r2 "Ti ,r2 



which simplifies to 



^s,r2^ri,r2 ^ ^s,ri^ri,r2 



(Appendix A. 81) 

Equation (Appendix A. 81 1 is a sufiicient condition for the stability of the point (0, tt). Using the same 
technique, similar conditions can be obtained for different fixed points. 

Appendix A. 4- Time of convergence 

As we showed in Equation (Appendix A. 81 1, what determines whether a point is stable or not are 
the relations between the different oscillator couplings. For example. Equation (Appendix A. 81 1 could 
be satisfied for a set of fc's that are very small, such as ks,r2 = -01, fcri,r2 — -01, and kg^n = .0001. 
However, since we are interested in biologically relevant models, showing that the equations of motion 
asymptotically converges to a fixed point for these values is not sufficient. We need to show that such 
convergence can happen within a reasonably short amount of time Atr, such that responses associated 
to the fixed points are correctly selected, after effective reinforcement. In this section we will estimate 
the values of the couplings such that the we should expect fast convergence. To do this, let us focus 
on the case of ks ri = —ks,r2 = ~^ri,r2 = k, k > {), discussed in Appendix A. 2. 2 In this case, the 
Kuramoto equations are 



dt 



Jffl2s 

dt 



—2k sin (01s) + k sin ( 
-l-fcsin - 02s) , 
~k sin (01s) + 2k sin ( 

^Is - 02s) , 



— fc sin ( 



'J2s) 



(Appendix A. 82) 



(Appendix A. 83) 
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and for (0,7r) we define = ei and (j)2s — + £2- Then (Appendix A. 82 1 and (Appendix A. 83 1 
become 



dei 

~dt 

d€2 

~dt 



= -3fcei, (Appendix A. 84) 

= — 3fce2- (Appendix A. 85) 



We emphasize that equations are just an approximation for the system close to the fixed point (0,7r). 
However, because of the sine terms in Kuramoto's equation, this is a reasonably good approximation, 



with an error of the order of 0(ef, ej). Thus, for points close to (0, tt), the solutions to (Appendix A. 82 1 



and (Appendix A. 83 1 are 
and 



(i) = 0i, (0)e-3^*, 



(j>2s {t)^TT+[(j>2s {0)-Tt] e"3fc*, 

where (j^is (0) and (j)2s (0) are the initial conditions. This is an exponential decay converging to the 
fixed point, and of course neither of the solutions go, within a finite amount of time, to (0, tt). However, 
the constant r = 1 /3fc gives us the mean lifetime of the system, thus giving a measure of how fast the 
convergence to the fixed point is happening. Thus, if we expect the mean lifetime to be at least of the 
order of Atr = 200 ms, k needs to be approximately 2 Hz. For practical purposes, if we have fc ^ 2 Hz, 
our system should converge most of the time within the expected time At^, a result consistent with 
our numerical simulations. 

Appendix A. 5. Synchronization of oscillators during reinforcement 

We are interested in knowing the qualitative behavior of the phases when a reinforcement occurs. 
During reinforcement, the oscillators satisfy equations 



difs 
~dt 



dfn 
dt 



diprr^ 

~dt 



dt 
dt 
dt 



+Ko sin {ips - ujet) 

^0 - ks,ri sin ((/3s - ifrj 

— fcs,r2 sin {ips — (pr2) 1 (Appendix A. 86) 

Kq sin {iprt - Wet - TT (1 - 5e„,i)) 
+Wo - fcs.ri sin [Lpr^ - Vs) 

— fcri.ra sin (iy9ri ~ <y3r2 ) i (Appendix A. 87) 

Kq sin {Lpr2 - Wgi - TT (1 - Se„,2)) 
+OJO - fcs.ra sin ((^^2 - Vs) 

— fc^i.ra sin((pr2 ~ V'ri ) : (Appendix A. 88) 

e (Kq) [a cos {(ps — ^ri) — ks.n] , (Appendix A. 89) 

e (Kq) [a cos {ips ~ Vr^) ~ fcsxa] j (Appendix A. 90) 

e (-K^o) [acos (ipn ~ f>r2) " ^ri,r2] J (Appendix A. 91) 



where En is either 1 or 2 depending on which finite response is being reinforced, and 5ij is Kronecker's 
delta, i.e., 5e„,2 is one if En = 2 and zero otherwise. We show here that the first three equations lead 
the oscillators s, ri, and r2 to synchronize and phase lock with reinforcement oscillators ei and 62 if 
Kq is sufficiently large. We start with the approximation 

(Ps W Wo + Kq sin ((^s - UJet) , 
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for Kq ^ kr^,r2i^s,riiks,r2- exact solution to this equation can be found, namely 

/ r \ 
ifs = Lo^t + 2 arctan — , 

where T = Kq — y/ — tan ^| {t + c-\) \J — K^^ , 6^0 = u)o — ^^e^ c\ is an integration constant, 
and we assumed w,. 7^ cjq- This equation shows a term with frequency cj,. plus another term that 
depends on t in a more complicated way. Let us assume, as we did in the main text, that is large 
enough, such that K^^ > |a;o — We|- This implies that the term inside the square root in the above 
equation is negative. Let 

and we have 



where 



= + 2 arctan (F (t)) , 

ii'o-iKtan(i(t + ci)m) 
F{t) = ^ — -. 

<JeO 

If we take the limit when t goes to infinity for the arctan term we have 



lim arctan {F (t)) = arctan 



We - Wo 



or 



arctan 



Kq + \JK^- (wo - We)' 
We - Wo 



If Kq ^ I Wo — We I, then 



lim arctan {F (t)) w arctan 

-4 



_2Ko_ 

We - Wo 



So, for large Kq and t, 
and 

We now turn to 
The solution is 



<Pri ~ ^et ± TT. 
ipr2 = Wo + -ft'o sin {ipr2 " '^e* - 71") • 

ifr^ = i^et — 2 arctan I — , 

\(>e0j 



where T' = Kq + ^5% - K§ tan (i {t + C2) ^ — Kq^ , and C2 is another integration constant. Fol- 
lowing the same arguments as before, we obtain 



lim arctan {F {t)) ~ arctan (0) 



42 



or, for large Kq and t, 



Of course, the above arguments only tell us the behavior of the equations when t goes to infinity, 
but in our models we deal with finite times. To better understand how fast the solution converges to 
uiet ± TT or Wet, let US rewrite equation 



<y9s = ujet + 2 arctan — 



3eO 



in terms of dimensionless quantities. Let 7 — (we — wq) / K^, then 

LPs = i^et 

+ 2 arctan - tan (^±^1^^^ 

Since 7 is real and 7 <C 1 for large values of Kq, we rewrite 



2 arctan 



^ + yr^tanh ^ 



We're only interested in the last term. 



2 arctan 



But for large values of t, 



which is equal to 



^ 1 + v/r^tanh ( ^»'*+%'^^ ^ ^ 



tanh 



e 2 



(Appendix A. 92) 



(Appendix A. 93) 



goes to 1. In fact, (Appendix A. 93 1, and therefore (Appendix A. 92 1, shows a characteristic time 

2 2 



Thus, for t > tc and for large Kq (i.e., Kq 3> |we ~ '^oDi ^ good approximation for ips is 
Similar arguments can be made for ifn and ipr2 ■ 
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Appendix A. 6. Effectiveness of Reinforcement for the Oscillator Model 

Here we show how to compute K' from the behavioral parameter 9 and the other oscillator param- 
eters. First, Kq satisfies the normal distribution density 

p{Ko) = -^—e^^(^°~^°y , (Appendix A.94) 

(TV27r 

where a is the standard deviation. The sigmoid function 

^ = l + e-^^°Ko-K') (Appendix A.95) 

determines whether the couplings are affected. We also assume that the time of reinforcement, At^, 
is large compared to l/e, as discussed in the text. If 7 ^ 1, e(-f4'o) approaches a Heaviside function 
H{Kq) as e(ifo) ~ ^oH{Kq — K'). Thus, for large values of 7, learning happens only if Kq > K' , and 
the probability of an oscillator reinforcement being effective 

— / e'^i^"^'^") dKo, (Appendix A.96) 

/2n Jk' 



a\'2n Jk 
or 

= ^ ^1 + erf (^^ ^°~^' ^ ^ . (Appendix A.97) 

Since 6 is monotonically decreasing with K' , it is possible to solve the above equation for K' . For 
example, if we set 9' — .19 we obtain that 

Kq-K' ^ -0.8779(7. (Appendix A.98) 



Choosing cr = 10 and = 100, from (Appendix A.98 1 we obtain K' = 187.79. 
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